DOCUMENT RESUME 



ED 065 225 



95 



RC 006 287 



TITLE 



INSTITUTION 



SPONS AGENCY 
PUB DATE 
NOTE 



Bilingual Testing and Assessment, Proceedings of Bay 
Area Bilingual Education League (BABEL) Workshop and 
Preliminary Findings, Multilingual Assessment Program 
(Berkeley, California, January 27-28, 1969) . 

Bay Area Bilingual Education League, Berkeley, 

Calif.; Multilingual Assessment Program, Stockton, 
Calif. 

Office of Education (DHEW) , Washington, D.C. 

28 Jan 72 

1 22p. 



EDRS PRICE MF-S0.65 HC-$6.58 

DESCRIPTORS Biculturalism; ♦Bilingual Education; ♦Conference 

Reports; Culture Free Tests; Intelligence Tests; 
♦Minority Groups; Norm Referenced Tests; Tables 
(Data); ♦Testing; ♦Test Interpretation; Test 
Reviews 



ABSTRACT 

The results and proceedings of the first annual 
Bilingual/Bicultural Testing and Assessment Workshop, held in 
Berkeley, California, on January 27-28, 1972, are presented in this 
publication. Approximately 150 bilingual psychologists and 
evaluators, educators working in bilingual/bicultural programs, and 
community representatives from California and Texas attended. 
Evaluations were made and the summaries are included of 8 tests used 
extensively in bilingual programs: the Wechsler Intelligence Scale 
for Children, the Comprehensive Tests of Basic Skills, the 
Cooperative Primary, the Lorge-Thorndike, the In ter- American 
Series — General Ability, the Culture-Fair Intelligence Test, the 
Michigan Oral Production Test, and the Peabody Picture Vocabulary 
Test. Also included in this publication are (1) an overview of the 
problem of assessment and evaluation in bilingual education, (2) a 
professional critique of the Inter- American series by Dr. Barbara 
Havassy, (3) a brief description of a Criterion Referenced System 
developed by Eduardo Apodaca, and (4) an article by Dr. Edward A. 
DeAvila discussing some of the complexities involved in testing and 
assessment of bilingual/bicultural children. (NQ) 



c 006287 — ED 065225 



/ 



BILINGUAL TESTING AND ASSESSMENT 



U S. DEPARTMENT OF HEALTH. 
EDUCATION A WELFARE 
OFFICE OF EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORIG- 
INATING IT. POINTS OF VIEW OR OPIN- 
IONS STATED DO NOT NECESSARILY 
REPRESENT OFFICIAL OFFICE OF EDU- 
CATION POSITION OR POLICY. 




■AY ARIA BILINGUAL IDUCATION LIAGUK 



me 



PROCEEDINGS 

op 

IABEL WORKSHOP ( 3*3 



J70P, M7l>, Ger&Zejf) 



and 

PRELIMINARY FINDINGS 
MULTILINGUAL ASSESSMENT PROGRAM 1 



o 

ERIC 



Preface 

This publication is the fruition of a joint effort by the 
Bay Area Bilingual Education League, Berkeley, and the Multi- 
lingual Assessment Program, Stockton. I wish to acknowledge 
Dr. Rene Cardenas, Director of BABEL and Mr. Joe R. Ulibarri, 
Director of the Multilingual Assessment Program, for their 
fine cooperation, interest, and support of this effort. 



It was the intent of this publication to share the results 
and proceedings of the first annual Bilingual/Bicultural 
Testing and Assessment workshop. Enclosed herewith the reader 
will find a test by test summary of the discussion and con- 
clusions reached during the workshop sessions. The reader 
will find that this represents a lay effort to find a solu- 
tion to one of the most pressing problems in Bilingual 
Education, that of Assessment and Evaluation. An overview 
of this problem is also presented so that the reader might 
gain additional insight into the processes that led up to 
the present "state of the art" in Bilingual/bicultural Educa- 
tional Evalufction. We are grateful to be able to include a 
professional critique of the Inter-American Series by Dr. 
Barbara Havassy, consultant to the Multilingual Assessment 
Program, Stockton, even though it was not an official part 
of the assessment workshop. Her work, entitled "A Critical 
Review of the New Inter-American Series" should make a timely 



I 

t 

s 



i 

i 



i 



and valuable contribution to those projects contemplating 
use of this instrument. 

A brief description of a Criterion Referenced System developed 
by Eduardo Apodaca in also presented and stimulates the appe- 
tite for further investigation and experimentation with this 
methodology. 

Dr. Edward A. DeAvila's original presentation, "Testing Ton- 
terias" has been graciously expanded to present additional 
considerations for our readers. "Some Cautionary Notes on 
Attempting to Adapt I.Q. Tests for use with Minority Chil- 
dren and a Neopiagetian Approach to Intellectual Assessment: 
Partial Report of Preliminary Findings" spells out more 
clearly some of the complexities involved in testing and 
assessment of Bilingual/Bicultural children. 

This is hopefully only a beginning effort by Title VII pro- 
jects to deal with an area where there are far too few experts 
and far too many novices attempting to tackle problems that 
are "Anglo" created, perpetuated and rewarded. As we continue 
the struggle it is imperative that the fruition of our 
efforts be shared. In this regard I welcome your comments 
and suggestions and promise to continue this effort in^/the 
San Francisco Bay Area. 



Requests for further information on this subject may be 
addressed to Mr. Joe Ulibarri, Director, Multilingual Assess- 
ment Program, 1111 No. El Dorado, Stockton, California, 
or to me at 1414 Walnut Street, Berkeley, California 94709. 



Olivia Martinez 
Bay Area Bilingual Education 
League 

June, 1972 













BILINGUAL 
EDUCATION LEAGUE 

; '1414 WAINUT STREET : 
BERKELEY, CALIF. 94709 






CONTRIBUTORS 



Olivia Garcia Martinez received her M.S.W. in School Social 
Work from the University of California, Berkeley. She received 
an advanced credential and training in School Psychology from 
the California state College, Hayward, and is presently the 
Coordinator of Testing and Evaluation for the Bay Area Bilingual 
Education League. 

Dr. Barbara Havassy received her Ph.D in Social Psychology 
from the University of Colorado. She is presently a consultant 
to the Multilingual Assessment Program in Stockton. 

Eduardo A. Apodaca is the Director of Project Kacer Vida, 

Title VII Bilingual Education program, Office of Riverside 
County Superintendent of Schools, Coachella Valley Branch. 

Dr. Edward A. DeAvila received his Ph.D from York University. 

He is a Developmental Psychologist and the current Research 
Director for the Multilingual Assessment Program in Stockton. 



Foreword 



by Olivia G. Martinez 

Bilingual Education originated in 1967 as an amendment 
to the Elementary and Secondary Education Act. It provided 
for the development and inclusion of a Bilingual Education 
program in districts that contained a sizeable number of 
Spanish-Speaking pupils. Prior to this time school districts 
that served predominately Spanish-Speaking pupils concentrated 
on crash "ESL" or "English as a second language" programs 
that were designed to teach English as soon as possible so 
that the native Spanish speaker would "function" in a regular 
classroom. 

It is not insignificant that it came after generations 
and generations of Mexican-American pupils had already been 
"pushed out" of the educational system. California and the 
Southwest have long held the distinction of having the largest 
number of bilingual inhabitants. Bilingual needs have been 
around for a long time and were documented as early as 1934 
when Chicano Educators first made their plea to the Psycholo- 
gical Associations for testing and assessment in one's native 
tongue. ^ Indeed California's first constitution was written 
in the Spanish Language! One can hardly turn a corner in 
California without a glaring reminder of the rich Spanish 
and Mexican heritage documented throughout the state. 



It is sad to note that bilingual education was recog- 
nised as a valuable and necessary program for the Southwest 
only after Congress saw fit to enact legislation to assist 
the "political refugees" from Castro's Cuba. They drew the 
very logical conclusion that if they were to welcome and 
provide for the large influx of Cubans some provision had to 
be* made to accomodate their bilingual needs in education. 

Thus the first monies were allocated to teach these unfor- 
tunate victims of a communist regime the language they 
would need to know for survival in their new country. Clearly 
the emphasis was on the acquisition of English. From there 
it was a fairly simple matter to make the generalization to 
the southwestern communities who were also seen to be unfit 
and unprepared because of their language differences to bene- 
fit from and contribute to American society. 

Today there exists a hodgepodge of programs under the 
banner of Bilingual Education, but not that many that actually 
practice what they preach. He at BABEL recognize a program 
as such only when instruction is offered in the dominant 
language of the child. The child should be allowed to 
achieve mastery in his own tongue before introducing a 
formalized reading program in English. Even then the 
child should be encouraged actively to continue concept 
and vocabulary improvement in his first language. Research 
conducted on Bilingual Education in Canada revealed that 
pupils who were totally fluent in their first tongue and 
could read and write their own language had a much easier 



time of acquiring a second language fluently and even went 

2 

on to excell when compared to monolingual peers. 

In this country , where pluralistic education has been 
a vague concept at best, Bilingualism and Bicultural ism 
has been viewed as a handicapl Despite the fact that certain 
segments of society* as many European societies, have long 
recognized the desirability of learning two languages, two 
cultures, etc. Indeed one qualification for entrance to 
colleges and universities was a foreign language program. 

Yet, Chicanos have been admonished and discouraged from 
perpetuating our "ready-made" bilingualism/biculturalism. 
Nowhere is this "handicap" so evident as in the area of 
evaluation, testing and assessment. More Chicano children 
have been labeled, placed, tracked, grouped and guided on 
the basis of various test scores than on any other single 
factor in the classroom. While there is no hard data to 
substantiate this claim, there are considerable statistics 
to document the failure of the public school system in educat- 
ing Bilingual, particularly, Chicano children. Sometimes 
referred to as the "push-out" rate, this well and perhaps 
overdocumented phenomena in many cases begins with a stan- 
dardized test of some sort. 

Aside from the "routine" testing for special educational 
needs and placements, an additional phenomena of testing for 
program effectiveness has emerged as a serious concern to 
Bilingual/Bicultural Educators. Bilingual Education is of 

V? 

v.. 



vii 



necessity an innovative program based on an innovative approach 
to educating all children. How then, can a traditional pre/ 
post test evaluation design, using traditional standardized 
instruments be expected to effectively evaluate an innova* 
tive, multi-component program? 

The Bay Area Bilingual Education League (BABEL) has 
five major components: The Instructional Program, staff 

Development, Curriculum and Materials, Higher Education and 
Media. To expect a standarized test or even a series of tests 
to document the effectiveness of these highly specialized 
areas is fallacious to say the least. Yet when school boards 
and administrators attempt to evaluate a program, particular- 
ly with regard to refunding or expansion, invariably one 
hears the test scores being reported. 

California has two required statewide programs for 
testing pupils in the public schools. They are the California 
School Testing Program and the testing required under the 
Miller-Unruh Basic Reading Act of 1965. The California School 
Testing Program began with a law passed at the 1961 Session 
of the Legislature for the purpose of revealing the status 
of California students with respect to the academic skills 
and content they have acquired. Amended in 1963, the act 
requires testing with intelligence, achievement and physical 
performance. The tests adopted for use in the 1969-70 
school year are the Lorge-Thorndike Intelligence tests in 
grades 6 and 12, the Compreheniive Tests of Basic Skills in 
grade 6, the Iowa Tests of Educational Development in grade 12 



viii 



and the California Physical Performance Test in grades 6 and 
12 • Intelligence tests are administered during the months 
of October and November , achievement tests during the 
months of October, and physical performance tests during 
April and May. 

The Miller-Unruh Basic Reading Act Testing in grades 1, 

2 & 3 was required in connection with a program to improve 
reading instruction in the primary grades. The Cooperative 
Primary Reading Test is administered the first 10 school 
days in May. Test results are reported to the State Depart- 
ment of Education, and one of the uses made of the required 
testing is in establishing the system of priorities for 
funding under the Miller-Unruh Basic Reading Act. Also, 
test results are used for evaluation of reading programs 
on both the district and State levels.* 

In addition to a concern over how well California pupils 
are doing compared to the rest of the nation, the state 
mandated testing program was seen as a means of prodding dis- 
tricts into revamping instructional procedures. This is 
apparently accomplished by publishing test scores in local 
papers where district by district comparisons as well as 
school by school comparisons could be made. Thus we have a 
situation where districts and schools are first rewarded 
for low test scores (qualifying for the Miller-Unruh funds) 
and then possibly penalized when significant growth is or is 
not ref lected in the scores (evaluation for continued funding). 
There is considerable evidence to document the inadequacy 



of ■ tandardi zedtests for soma minority and/or culturally 
different, bilingual children. If one is dissatisfied 
with this point of view (based on work done by Dr. Palomares, 
Dr. Steve Moreno, George Sanchez and others) then he need 
only refer to the various law suits pending on the misuse 
of standardized tests results, for Spanish-Speaking children.^ 
Yet standardized tests continue to be treated as if they 
do in fact adequately assess such children. The problem is 
complex and emotionally charged. If one wants only to 
know how well Bilingual/Bicultural children perform on 
standardized X.Q. and achievement tests in comparison to 
middle class children, and if one wants to know how well 
minority children can do on a dominant culture value oriented 
(i.e. how well he can take anglo tests), and if one wants 
evidence of how implicit functional objectives of various 
educational programs are failing to serve bilingual/bicultura.1 
children, than that is a defensible position. Since a rela- 
tively small percentage of people understand testing, test 
development and statistical inferences, it is well to 
consider the current use of standardized tests "assume a 
universality in community of experiences ... a test is valid 
only to the extent that the items of the test are as common 
to each child tested as they were to the children upon 
whom the norms were based." The problem as I see it relates 
to the fact that tests are not administered for the positions 
described above, or even with those notions in mind. Instead, 
standardized tests are used as a reflection of the innate, 



x 




and potential intelligence of children, aa a predictor of 
future accomplishments (remember the self-fulfilling prophecy) , 
as a device to group and label, and finally as proof of the 
inadequacy and handicaps a bilingual/bicultural child brings 
to the educational setting. 

Dr. Uvaldo Palomares has described the unique motiva- 
tional style Chicano youngsters bring to the classroom. He 
also discusses the concept of positioning and cultural diver- 
gence in an attempt to document how standardised l.Q. tests 
are not fair to Chicano Children.^ We don't need any morri 
evidence. Most persons knowledgeable about tests and their 
uses readily agree with George Sanches's position that the 
worth of test-results lies in their proper interpretation 
and in the assistance which such interpretation lends to fur- 
thering the educational needs of the pupil. An l.Q. ratio, 
as such has no value. It is only when that measure is used 
critically in promoting tho best educational interests of 
the child that it has any worthwhile significance to the 
educator. 7 Yet test publishers willingly demonstrate how to 
collapse scores to yield a grade equivalent, 1.0. and per- 
centile rank, that require a tremendous stretch of the 
imagination to be seen as helpful to the teacher. 

1 could provide pages and pages of anecdotal material, 
including several personal experiences that would dramatically 
illustrate the evils of testing minority students, however, 

1 rejeot the notion that Minority educators must continue to 



perform before our advice ie heeded. He know the dangers 
in using standardized tests is their misuse , the test 
publishers know it. and many key educators know it. Zf the 
State of California, by mandating such tests and allowing 
their continued misuse is the originator and perpetuator 
of. say. tracking, and labeling, what does that say for 
California's commitment to equal educational opportunity? 

Elsewhere in this publication is a description of a 
testing and assessment workshop recently hosted in Berkeley 
by BABEL. This meeting of approximately ISO evaluators, 
psychologists, and educators was originally conceived because 
of the dissatisfaction and concern of Chicano. Asian and 
other bilingual educators, with the continued use of stan- 
dardised achievement tests and traditional X.Q. tests. 

As evaluators of Bilingual programs we were particularly con- 
cerned about tho use of such tests for programmatic evaluation. 
The problem is multi-dimensional » Bilingual programs 
need thorough evaluations. He must be able to assess where 
and how effectively we are going. What is happening to 
children in our programs that would not otherwise happen 
to them? As discussed earlier in this paper, there is evidence 
to suggest that routine testing and assessment of Bilingual/ 
Bioultural ohildren is unhelpful, if not harmful. The simple 
translation of existing tests is unsatisfactory and merely 
results in presenting the same unacceptable, culturally biased 
content in Spanish, (sometimes changing the degree of difficulty 






O 

ERLC 



in the process) . Development of new bilingual/bicultural i 

instruments is costly , time-consuming and would most likely j 

perpetuate the worn out concept of testing the child j 

i 

and not the system. Besides, there is no one test in exis- ! 

tence today that adequately assesses anglo children, let j 

alone the many and various programatic components. Excluding | 

bilingual/bicultural children from existing state and ] 

district testing programs suggests a continuation of the ; 

( 

"labeling by separation" tendencies we are attempting to 1 

I 

destroy. 

A recent survey by the Multi-lingual Assessment Program 
in Stockton revealed the thirteen most commonly used tests 
in Bilingual Education Projects in California to be as 
follows: 

Culturo Fair Intelligence Test 
Van Alystyne Picture Vocabulary Test 
Peabody Picture Vocabulary Test 

| 

Metropolitan Readiness Test i 

Inter-American Series 

I 

Goodenough Draw a Person | 

Lorge Thorndike* ! 

i 

Stanford Achievement* 

Michigan Oral Language Test 
Test of Basic Experiences 
Metropolitan Achievement Test 
Comprehensive Test of Basic Skills* 

Cooperative Primary* 

*Btate Mandated Testing Program 

xiii 1 2 

» • * 



Many people have repeatedly criticised these instruments 
and how unhelpful they are. Few people have actually docu- 
mented where these tests penalize or harm bilingual/bicul- 
tural children, and this was the ambitious task of this 
first workshop on Testing and Assessment. A second objective 
was to look at the Criterion Referenced system as an alter- 
native to traditional assessment. The proceedings of this 
workshop along with the resolutions passed describe how 
enormously complicated this task was and more than likely 
attests to the general naivete of persons using such tests. 
That is, it was only when groups attempted to document the 
so-called inadequacies of tests that they became truly 
aware of the intended uses of such instruments and how little 
they actually knew about them, in several instances, what 
the author of the test intended, and what the publishers 
suggested and what the school personnel actually used the 
tests for were all very different I Few persons took the 
radical position of categorically condemning all tests for 
all purposes under any circumstances. However, few could 
deny that the gross misuses of tests historically and up 
to the present did warrant such considerations and that 
perhaps some sort of moratorium might be necessary as an 
interim measure. 

While we were unable to critique all the tests as 
hoped, in general I felt many people left this workshop 
more informed and more comfortable in their conviction 



that standardized tests should be removed from their 
position of sanctity and relegated to a more menial 
place in education, but uncomfortably aware of the fact 
that the blame for the devastating results labeling has had 
on bilingual populations does not lie with the test alone; 
nor will the simple act of discontinuing their use provide 
the solutions to our dilemma. 

In the meantime, then, can we please turn our attention, 
energy and resources to alternatives to standardized testing, 
i.e. non-obtrusive measures, behavioral and affective areas 
and Criterion-Referenced Tests? 

THE CRITERION-REFERENCE MODEL 

Of the several alternatives presently available to us, 
the Criterion-Referenced Model appears to be the most promising. 

In an article by Rex Jackson entitled "Developing 
Criterion-Referenced Tests", a definition of Criterion-Referenc- 
ing is offered as follows: 

According to Wang (1969) a "criterion-referenced test 
is an achievement test developed to assess the presence or 
absence of a specific Criterion behavior described in an 
instructional objective" . The term appears to have been 
introduced by Glaser (1963) in a paper in which he distinguishes 
"criterion-referenced" from "norm-referenced" testing. In 
the latter, an individual's test performance is interpreted 
with respect to the performance of other individuals who 
belong to some specified population. In contrast, the 



interpretation of an individual* ' performance on a criterion- 
referenced test is a behavioral statement (or set of state- 
ments) that is made without reference to the performance of 

O 

other individuals. This system has also been referred to 

as compentency-based or even precision teaching. 1 feel 

that essentially they are all the same thing - that i% they 

all attempt to test what one has been teaching, not what 

some test developer assumes has been taught. 

Two bilingual education programs, one in Indio and the 

other in Santa Ana, California are currently using such 

g 

a model and initial indications are very promising. 

Mo one is willing to categorically state that Criterion- 
Referenced Tests will provide the solutions to all our prob- 
lems. However, it certainly appears to suit the needs of 
Bilingual/Biculturai Education more readily than norm-refer- 
enced or standardized tests. Let's keep testing in its 
rightful place - as a mere tool in the educational kit 
designed to educate and serve. 



O 

ERIC 



15 



xvi 



FOOTNOTES 



^Romano, Octavio Ed. Quinto Sol Publications, Berkeley, 
California El Grito, Vol. II. 

^Lambert, Wallace E. and Tucker, G. Richard. "The 
Home/School Language Switch Program in the St. Lambert 
Elementary School, Grades K-5." 

3 U.S. Commission on Civil Rights, "The Unfinished 
Education, Outcomes for Minorities in the Five Southwestern 
States". October, 1971. 

^California State Department of Education. "Miller- 
Unruh Basic Reading Program". Annual Evaluation Report, 
1969-70. 

Spalomares, Uvaldo and Trujillo, Miguel P. First 
Quarterly Project Report on "Examination of Assessment 
Practices and Goals and the Development of a Pilot 
Intelligence Test for Chicano Children". O.E.O. Grant, 
Washington, D.C., October, 1971. 



^Sanchez, George I. "Bilingualism and Mental Measures: 
a word of caution". Chicanos Social and Psychological Perspec- 
tives , (1971) . Natanial Wagner and Morsher J. Jaug, Eas. 

8 Jackson, Rex. Developing Criterion-Referenced Tests, 
Test Development Division, Educational Testing Service, 
Princeton, New Jersey. June, 1970. Distributed by ERIC 
Clearinghouse on Tests, Measurement and Evalution. 

^Project Hacer Vida, Title VII Bilingual Education and 
Diagnostic Placement, Santa Ana Unified School District, 

Santa Ana, California. 



TABLE OF CONTENTS 



% 

i 



i 

[ 

> 



% 

Q 1 

ERIC 



PREFACE 

FOREWORD 

PART 1 TESTING AND ASSESSMENT WORKSHOP 

1. Rationale for the Meeting 

2. Wechsler Intelligence Scale for Children (Wise) 

3. Comprehensive Tests of Basic Skills (CTBS) 

4. Cooperative Primary 

5 . Lorge-Thorndike 

6. Culture Fair Intelligence Test 

7. Michigan Oral Production Test 

8. Peabody Picture Vocabulary Test 

9. Critique Guide 

10 . Position Statement 

11. Resolutions 

PART 2 PROFESSIONAL CRITIQUE OF THE NEW INTER-AMERICAN 
SERIES 

PART 3 ABSTRACT: A SYSTEM FOR CRITERION-REFERENCED 

ASSESSMENT OF A BILINGUAL CURRICULUM 

PART 4 SOME CAUTIONARY NOTES ON ATTEMPTING TO ADAPT IQ 
TESTS FOR USE WITH MINORITY CHILDREN AND A NEO- 
PIAGETIAN APPROACH TO INTELLECTUAL ASSESSMENT 



17 






PART 1 TESTING AND ASSESSMENT WORKSHOP 
Rationale For The Meeting 

In response to growing dissatisfaction among bilingual/ 
bicultural educators, evaluators and psychologists with the 
continued use of standardized achievement and traditional 
IQ tests, BABEL held a Testing and Assessment Workshop 
in Berkeley, California on January 27-28, 1972. In atten- 
dance were approximately 150 bilingual psychologists and 
evaluators, educators working in bilingual/bicultural pro- 
grams, and community representatives, from all over the 
Bay Area, Northern and Southern California, and Austin, San 
Antonio, Fort Worth and Crystal City, Texas. 

The conference was planned with three specific objectives 
in mind. First, while people have repeatedly criticized exist- 
ing tests being used in Bilingual Education Programs, few 
have actually documented where these tests penalize or harm 
bilingual/bicultural children. The first objective of the 
BABEL conference was to examine closely eight of these instru- 
ments and attempt to document harmful or inappropriate facets 
of them. The following tests, all used extensively in Bilingual 
Education Programs, were so discussed: 

WISC (Weschler Intelligence Scale for Children) 
CTBS (Comprehensive Tests of Basic Skills) 
Cooperative Primary 
Lorge-Thornd ike 

Inter-American Series — General Ability 
Culture-Fair Intelligence Test 
Michigan Oral Production Test 
Peabody Picture Vocabulary Test 

1G 



1 



2 



i 




A second objective was to look at the Criterion Referenced 
models as a realistic alternative to traditional assessment. 

The third objective was to formulate and adopt a resolution (s) 
for consideration in Sacramento and elsewhere in the country. 

The format of the conference was organized to facilitate 
the implementation of the above objectives. The conference 
opened on Thursday morning, January 27, 1972 with an informal 
coffee hour, followed by introductions and a welcome given 
by Dr. Rene Cardenas, Director of BABEL. Mrs. Olivia 
Martinez, Coordinator of Testing and Evaluation, gave a short 
background of the initial conception of the conference, and 
the responsibilities of those in attendance. The General 
Session was conducted by Dr. Ed DeAvila of the Multi-Lingual 
Assessment Program in Stockton, California. The text of his 
talk, "Testing Tonterias", is included here in this pamphlet. 
After a short break the entire group broke up into eight 
workshop sessions to evaluate the tests mentioned above. The 
workshop members were asked to examine, discuss and evaluate 
their test according to the following guidelines: vocabulary, 

illustrations, directions, lay-out design, cultural implica- 
tions, translations, timing and scoring procedures, and 
norming of the test. A copy of the critique guidelines can 
be found in this pamphlet. (The workshop sessions lasted 
into the late afternoon) . At the end of the sessions, each 
workshop member was asked to summarize his findings about the 
test in terms of the effectiveness and appropriateness of 
the test for use with bilingual/bicultural children in Bilingual 



3 



Education Programs. The members were also asked to complete 
and sign a position statement on the test, in which recom- 
mendations for the future use of the test were stated: 
continued use, modification, discontinue. A copy of the 
position statement appears in this pamphlet. Late in the after- 
noon a general session was held in which the findings and 
recommendations of each workshop were briefly summarized, 
discussed, and legal strategies considered. 

The general session on Friday, January 28, 1972 dealt 
with alternatives to present standardized tests. Ed Apodaca, 
Director and Tomas Lopez, Evaluator of Project Hacer Vida 
spoke about "The Indio Criterion Referenced Model" , and 
explained to those in attendance the formation, objective 
and use of the Criterion Referenced tests. Mr. Ben Soria, 
Director and Norm Nicolson, Evaluator reported on the Santa 
Ana Evaluation plan. It was felt that the Criterion Referenced 
Models along with attitudinal surveys, self-concept measures 
and other affective considerations should provide an appro- 
priate and meaningful measure of program effectiveness. The 
rest of the morning and early afternoon was devoted to small 
grade level meetings in which the criterion referenced models, 
other alternatives, resolutions and position statements were 
discussed. Late in the afternoon another general session was 
held to draft group resolutions and discuss potential legal 
strategies. 

The participants were sIbo asked to fill out a form 
evaluating the various aspects of the two day conference. A 



copy of this evaluation form ia included in the pamphlet* 

The general opinion of the conference participants was veiy 
favorable. 

BABEL is planning another Workshop to be held sometime 
during the school year 1972-73. This conference will concentrate 
on the Criterion-Referenced Models of assessment— how they 
L are constructed and how they are to be used. BABEL also 

hopes to establish a means of training people in the uses 
and implementation of the Criterion -Referenced models 
in order that these models can be used in the trainee's 
school districts. 



5 



Wechsler Intelligence Seal* for Children — WISC 

The Wechsler Intelligence Scale for Children grew out 
of the Wechsler-Bellevue Intelligence Scales used with 
adolescents and adults. The WISC may be used with children 
ages 5 through 15. 

The WISC consists of 12 tests which are divided into two 
subgroups identified as Verbal and Performance. The tests 
of the scale are grouped as follows: Verbal; General Infor- 

mation. General Comprehension. Arithmetic. Similarities. 
Vocabulary and digit span. Performance; Picture Completion. 
Picture Arrangement. Block Design. Object Assembly. Coding 
or Mazes. Normally. 10 of these tests are given. Digit span 
and Mazes (or Coding) are considered supplementary tests to 
be added when time permits, or used as alternate tests. While 
the tests are identified as Verbal and Performance, and differ 
as these labels indicate, they each tap other factors, among 
them non-intellective ones, which produce other classifica- 
tions or categories that arc important in evaluating the 
individual ' s performance . 

The theory underlying the WISC is that intelligence 
cannot be separated from the rest of the personality. An 
attempt is made, then to take into account the other factors 
which contribute to the total effective intelligence of the 
individual. The WISC renounces the concept of mental age 
as the basic measure of intelligence— I. Q.s are obtained by 



cowparing each subject's test psrfonaanca exclusively with 
the scores earned by others in his own age group, rather 
than by coeparing the performance with composite age groups. 
Also, no attempt has been made to define the social and 
clinical significance of any given IQ. 

The group that evaluated the HI SC was greatly concerned, 
with the cultural orientation of the test. It was definitely 
felt that this test is not anywhere within the cultural 
reference of bilingual/bicultural children. The test is 
Anglo -culture oriented, and neither the illustrations 
nor the vocabulary can be generalized to other cultures. The 
consensus of the group was that this test is an unfair instru- 
ment to use in measuring the IQ of bilingual/bicultural chil- 
dren. When used with bilingual/bicultural children, the 
WISC measures acquired aculturation to mainstream middle 
class white culture, rather than I.Q. 

In terms of directions and timing, the group felt that 
the WISC creates problems for the bilingual/bicultural child 
The directions are too difficult in both the written and 
oral forms for these examinees. Many bilingual/bicultural 
children are unable to read the written directions because 
of their initial problems in learning to read English. Often 
too, the oral directions are difficult because of the unfami- 
liar vocabulary that is usad. The WISC is a timed test. The 
majority of the group was convinced that timed tests are 
not valid for testing bilingual/bicultural children, because 
they do not give an accurate picture of the actual abilities. 



thus fatiguing 



It was also fslt that ths HISC is too long, 
tha axamineas , and again, not giving a trua picture of 
ability. 

Tha group was critical, too, of tha lay-out design and 
illustrations used in the HISC. It was felt that there are 
few illustrations, and that there should be more available 
on the test. The lay-out design is also inadequate. There 
are too many items crowded onto each page, making the 
test confusing, especially for primary grade children. 

The group was adamant about the fact that this test was 
not developed for testing bilingual/bicultural children. 
Considering this fact, the group felt that translating this 
test directly into Spanish would not make it more valid. 

The group decided that a translation of the WXSC would 
have to take several things into account. First, a transla- 
tion would have to be correlated with classroom instruction 
and activities in bilingual education programs, in order to 
give the test validity. Secondly, any translated version of 
the HISC would have to consider the many regional variables 
in written and spoken Spanish in the United States. These 
variables would have to be included in the test, and accepted 
as correct responses where applicable. 

The group felt that the result of the HISC are absolutely 
confusing and meaningless for bilingual/bicultural children. 
The group was concerned about the fact that the results of 



the WISC would label bilingual/bicultural children and 
negatively affect teachers' attitudes toward students. The 
majority of the group seemed to feel that the WISC could 
possibly be used only as a diagnostic test, but that it is 
totally invalid as an intelligence test when used with 
bilingual/bicultural children. 

It should be noted that there is a movement in the 
California State Department of Education to initiate a 
project to renorm the WISC for the bilingual/bicultural popu- 
lation of the state. The evaluation group was definitely 
against the renorming of the WISC for the following reasons: 

A. There is a Spanish version of the WISC already 
developed in Puerto Rico which is not desirable because 
it does not include regional variations in the Spanish 
language . 

B. The researcher presently involved in the renorming 
project iB not bilingual/bicultural. 

C. Research shows that bilingual children generally 
do not benefit from taking a Spanish version of an 
I.Q. test. 

D. The population that would be normed is linguistically 
very diverse in Spanish, which would make the renorming 
of this test difficult, and the results, at best, vague. 

E. The group rejects the use of I.Q. as a solitary 
measure of the intelligence of bilingual children. 

F. There is a need for the development of criterion 

reference measures to determine the abilities of bilingual/ 
bicultural children. 2 fS 




9 



In conclusion, the group seemed to feel that the WISC 
can not effectively evaluate either the success or weakness 
of a bilingual program, the potential and I.Q. of bilingual/ 
bicultural children, or what these children learn in bilingual 
education classes. It was concluded that the WISC does not 
reveal to the classroom teacher how she might improve her 
teaching. The group was concerned with the fact that taking 
this test .might definitely be harmful to the bilingual/bicul- 
tural child, unless the test was used for diagnostic purposes. 
The majority of this group seemed to feel that the WISC could 
not be effectively modified for use in bilingual education 
programs, and that new instruments should be developed to 
replace the WISC. 

The group recommended that the WISC be discontinued 
as an evaluative tool for bilingual/bicultural populations, 
but that its use be continued for individual diagnostic 
purposes on special children with certain learning difficul- 
ties. The group suggested "...that an organized group of 
bilingual/bicultural psychologists (i.e. through CASPP) 
recommend to the State Department of Education or to the 
State Legislature or to whomever can effect change in each 
state, that any existing version of WISC be discontinued as 
a measure of intelligence when used with bilingual/bicultural 
children. " 



,26 



The Comprehensive Tests of Basic Skills — CTBS 

The CTBS are a series (of batteries) of 10 tests in four 
basic skills areas: reading, language, arithmetic and study 

skills. There are four levels of the CTBS in the series, 
designed as follows: Level 1 for grades 2. 5-4. 9; level 2 

for grades 4. 0-6. 9; level 3 for grades €.0-8.9; and level 4 
for grades 8.0-12.9. The overlapping levels provide the user 
with a choice of level for use in Grades 4,6 and 8. 

These tests were designed to measure the extent to which 
the individual student has developed the capabilities and 
learned the skills which are pre-requisite to the study of 
specific academic disciplines. The emphasis in this series 
is on the measurement of (the grasp of) broad concepts and 
abstractions developed by all curriculums, and on facility 
in such skills as classifying, manipulating, translating 
and interpreting, which are needed in the effective use of 
language and number. These tests are not like basic achieve- 
ment tests in that they are not affected by the content 
material used to teach students. Performance is affected 
by the grade level at which topics are introduced into the 
curriculum and by the development of the necessary capabilities 
to perform the tasks. 

The test items in the CTBS for the four skills mentioned 
above generally measure the following: the ability to 

recognize and/or apply techniques, including performing 




fundamental operations; the ability to translate or convert 
concepts from one kind of language (verbal or symbolic) , to 
another; the ability to comprehend concepts and their inter- 
relationships; the ability to extend interpretation beyond 
the stated information. 

In evaluating the CTBS for use with bilingual children 
and/or in a bilingual program, there were several general 
considerations that concerned the group of evaluators. 

Of primary concern was the fact that the CTBS is oriented 
toward the Anglo culture, Anglo study skills and school 
situations and obviously, Anglo use of the English language. 
This orientation might make the CTBS fairly effective when 
used with white middle class Anglo students. The same 
orientation renders the CTBS highly ineffective and inappro- 
priate when used to evaluate the abilities of bicultural or 
bilingual students. There is little in the CTBS that 
bicultural/bilingual children can relate to and it is signi- 
ficant that tests like the CTBS do not have multi-cultural 
considerations, so as to be appropriate for those who must 
take the test. Thus, it was felt that the CTBS is being 
used presently in the state of California not because of its 
effectiveness, but for two very different reasons. First, 
it is a state mandated test, that is, it is designated for 
use in the public schools of California by the State Depart- 
ment of Education. Secondly, resultB of the CTBS show 
bicultural/bilingual students functioning far below grade 



12 

level, and these results are used as a vehicle for obtaining 
state and federal financial aid for various school districts. 

The group was also concerned about the directions used 
in administering the CTBS. It was felt that the bilingual/ 
bicultural child could have difficulty understanding the 
directions of this test because of a possible limitation in 
knowledge of English, and his problems with mitten English. 

It was decided that translating the directions into written 
Spanish would not be beneficial to the bilingual child 
because of the fact that many Spanish- speaking children in 
the United States are illiterate in Spanish. 

It would be invalid to translate the existing CTBS into 
standard Spanish not only because of the illiteracy problem, 
but also because of the regional differences both in Spanish 
language and culture, make it very unlikely that the test 
could be normed for a general area or region of the country. 
The possibility of a national standardized test for bilingual 
children was discussed in these terms, and was rejected by 
the group of evaluators. 

The evaluators considered the lay-out design of the 
CTBS very confusing for the examinees. The pictures, 
questions and phrases are poorly spaced on the pages of the 
test. They are cramped together and present an unorganized 
and indiscriminate set of stimuli for the examinees. 

The group rejected the concept of the timed test. The 
timed test adds a great deal of tension and pressure to the 
testing situation, and thus tends to give an imprecise picture 



t 



13 







of the examinees' abilities. It was also felt that the 
CTBS is too long and tends to fatigue the examinees. The 
fatigue factor can affect the results of the test. 

Another consideration that the group was concerned 
with' «ras*Ttw-scor±»JT»ro(3edures* , of the CTBS. They felt 
that there is little correlation between the test scores and 
actual classroom behavior. There seem to be many instances 
where the examinees do poorly on the test but are progressing 
faitly well in the classroom. It was also felt that any type 
of score on a test such as the CTBS is dangerous in that it 
tends to be misused by classroom teachers in labeling students 
and student potential, and thus creating a particular bias 
in teacher attitudes toward students. 

The group that evaluated the CTBS came to several conclu- 
sions about the appropriateness of this test for use with 
bilingual/bicultural children. It was decided that the CTBS 
effectively evaluates neither the potential of the bilingual/ 
bicultural child, nor what a child has learned in bilingual 
education classes, nor the successes or weaknesses of 
bilingual education programs. It was felt that the CTBS is 
also ineffective in revealing to the classroom teacher 
how she may improve her teaching. Moreover and more impor- 
tant, it was decided that taking this test is of negative 
value to the bilingual/bicultural child, and may very possibly 
be harmful to him. The majority of the group felt that 
there was a possibility that this test could be modified 
for use in a bilingual education program, but that the success 



14 

of such modification was doubtful. . 

Tha evaluation group recommended that the CTBS be 
diecontinued as an evaluative tool for bilingual/bicultural 
populations. The group felt that, "A new instrument should 
be developed which takes the child's cultural reference and 
language capabilities into consideration. The CTBS scores 
are not only invalid, but in many instances are detrimental 
to the self-concept of bilingual/bicultural students." 



Cooperative Primary Reading Tests 

The Cooperative Primary Tests are carefully constructed 
and standardized general achievement tests. As such, they 
may be expected to serve a wide variety of educational and 
administrative purposes. One of the major purposes of the 
test series supposedly is to provide teachers with measures 
of children's concepts and skills that closely relate to 
their work in the classroom. Identify other forms, i.e., math 

The skills being tested on the cooperative Primary 
Reading Tests are representative of three categories: Compre 

hension, which asks for identification of an illustrative 
instance and/or identification of an associated object or 
instance; Extraction, which asks for the extraction of an 
element or elements, the extraction of an element in order, 
or the identification of an omission; and Interpretation, 
Evaluation and Inference. There are no time limits. The 
children are allowed as "reasonable" amount of time to 
finish the test. 

The Cooperative Primary Reading Tests were normed in 
April of 1966 in four regions of the United States. Approxi- 
mately 1700 public school children at 170-176 schools made 
up the sample at each grade level. The data gathered from 
these administrations were used to develop scaled scores, 
percentile ranks, stanines and grade equivalents. The 
Cooperative Primary Reading Tests yield one score: the 

total number of correct responses. The most widely used 




is on 

» o *,• 



16 

means of Interpreting this raw score for these tests is 
percentile ranks. In some cases test scores may provide the 
teacher with important clues about the achievement of a 
child. "In most cases, however, test scores will serve 
primarily as verification of her judgement." (manual p.8) 

t 

There was general agreement among the members of the 
group that evaluated the Cooperative Primary Tests, that 
these tests have some value when used with the intended 
population-similar to the norming population-middle class, 
monocultural, English-speaking children. The group unanimously 
feXt that these tests do not have validity when used with 
bilingual/bicultural children. Xn coming to these conclusions, 
the group dealt with se.veral basic considerations. 

First, the group was concerned with the vocabulary items 
used on the Cooperative Primary Test. Xt is much too advanced, 
not only for a heterogeneous group cf examinees that includes 
bilingual/bicultural children, but most probably for any child 
taking the test. Xt was felt that many of the items 
are difficult, and inappropriate, "oecause they are regionally 
oriented. Words like "mitten" and "snowman" are for many 
pupils, inappropriate items when used in certain regions of 
the country. Many of the evaluators also felt that the 
vocabulary used in the directions is difficult, and particu- 
larly so for bilingual/bicultural children. A major criticism, 
too, was that the directions are too lengthy, almost to the 
point of becoming unclear. 

Of particular concern to this group of evaluators was 



17 



O 

ERIC 



the general lay-out design and visual presentation used in 
the Cooperative Primary. It was felt that the lay-out 
is definitely too crowded, both picture and vocabulary items 
are packed together on each line and each page. The total 
effect is very distracting and confusing for the children. 

One evaluator made the comment that children are not taught 
to read in the manner in which the words are positioned on 
this test. The group was critical, too of the illustrations 
used. While the quality of the pictures themselves is accept- 
able, it was felt that some of the illustrations are mislead- 
ing, for example the snowman, snail, mitten and elephant, 
do not accurately depict the desired responses. There was 
concern among some of the evaluators, too, that several of 
the pictures are not easily identifiable to bilingual/bicul- 
tural children. "The children have never been exposed to 
many of the pictures given • " An additional suggestion was 
made to darken the lines of the arrows in which the cues 
are written. 

Although the Cooperative Primary is not a timed test, 
the group seemed to feel that bilingual/bicultural children 
would not have sufficient time to finish the test. There 
was also the feeling that the competitive factor is signifi- 
cant, "The administration of a test that requires competition 
is not usually beneficial to bilingual/bicultural children. " 

The group also felt that the length tends to frustrate children. 

The group was adamant in its opinion that the results 
of the Cooperative Primary Tests are unclear and meaningless 

. v f 



18 

for bilingual/bicultural children. In this case, it was felt 
that the results not only do not help a teacher to understand 
and help students, but are also misleading and harmful. "Tests 
such as this are the basis for pegging minority children and 
placing them in NR classes." "It is mandated by the state 
of California, an absurd requirement." "A crime to use it." 

The group came to some final conclusions about the 
Cooperative Primary Reading Tests and their appropriateness 
for use with bilingual/bicultural children. It was felt that 
this achievement test evaluates neither the potential of 
bilingual/bicultural children, nor what a child might learn 
in bilingual education classes. This test could not evaluate 
the successes and weaknesses of a bilingual education program 
either, nor does its use necessarily reveal to the teacher 
how she might improve her teaching. There was a definite 
consensus among the group that this test is of negative 
value and harmful if given to bilingual/bicultural children. 

The group felt that there was no realistic possibility of 
modifying this test for use in bilingual education programs. 
"Modification would merely give a semblance of validity to 
an invalid instrument." 

The evaluation group recoiranended that the Cooperative 
Primary Reading Tests be discontinued as an evaluative tool 
for bilingual/bicultural populations. Many group members 
also felt strongly that this test should not be used under 
any circumstances for bilingual/bicultural children. 

o r 



Lorge-Thorndike Intelligence Tests 

The Lorge-Thorndike Intelligence Tests are a series of 
tests of abstract intelligence. That is, they test the abi- 
lity to work with ideas and the relationships among ideas. 

The tests are based upon the premise that most abstract 
ideas with which school children or working adults deal are 
expressed in verbal symbols. Thus, verbal symbols are the 
appropriate medium for testing abstract intelligence. These 
tests take into account the fact that for some - "the young, 
the poorly educated, or the poor reader" - printed words are 
an inadequate measure of abilities. Consequently, a parallel 
set of nonverbal tests is provided. 

There are two batteries of tests. The Primary Battery 
is used with subjects in kindergarten through third grade, 
and consists of two levels. There are three subtests for each 
level, Oral Vocabulary, Cross-Out and Pairing. Each requires 
less than ten minutes for administration. However, the test 
is untimed and the administrator adjusts the pace to the 
students. The Multi-Level Battery tests subjects in third 
grade through college and has eight levels. The term "multi- 
level" indicates that there is a graded series of items 
divided into eight different but overlapping scales for use 
within the grade range. There is a separate series of items 
for each grade in the lower end of the overall grade range 
and a separate series of items for each pair of grades in 
the upper part of the grade range. 



The Verbal series of the multi-level tests is made up 
of five subtests which use only vocabulary items: vocabulary, 

verbal classification, sentence completion arithmetic reason- 
ing and verbal analogy. The Nonverbal series uses items which 
are either pictorial or numerical. It contains three sub- 
tests involving picture classification, pictorial analogy, 
and numerical relationships. The working time for the 
Verbal series is 35 minutes and for the Nonverbal series is 
27 minutes. It is suggested that both series of tests be 
used for the appraisal of children in schools. 

The Lorge-Thorndike Intelligence Tests were evaluated 
by a group that was concerned with several general considera- 
tions. 

Of primary concern to the evaluators was the fact that 
the Lorge-Thorndike is currently being used in the state of 
California, as a state mandated test. It was felt by the group 
that the test is ineffective, and has been indiscriminately 
designated for use in the public schools. Secondly, the group 
felt that the Lorge-Thorndike is being used at the expense 
of bilingual/bicultural children. The results usually show 
bilingual/bicultural students functioning far below grade 
level and these results are used as a vehicle for obtaining 
state and federal financial aid for various school districts. 

The evaluators considered the Lorge-Thorndike to be 
culturally biased. The test is definitely oriented toward 
the Anglo culture, and offers little that is within the 
bilingual/bicultural student's cultural reference. This 




21 




renders the Lorge-Thorndike highly ineffective and inappro- 
priate for measuring the I.Q. of these examinees. The 
group felt that any test like the Lorge-Thorndike must have 
multi-cultural implications * so as to make it appropriate 
for those who must take the test. 

The group felt that the directions used in administering 
the Lorge-Thorndike Tests are too difficult for the bilingual/ 
bicultural child* because of his possibly limited knowledge 
of English and problems with written English, it was decided 
that translating the directions into written Spanish would 
not be beneficial to the bilingual child because of the fact 
that many Spanish-speaking children in the United States are 
illiterate in Spanish. Direct translation of the Lorge-Thorn- 
dike into Spanish would solve nothing, it would be invalid 
to translate the existing Lorge-Thorndike tests into standard 
Spanish* not only because of the illiteracy problem* but also 
because of the regional differences in the Spanish spoken by 
children who would take such a test. Regional variables both 
in Spanish language and culture make it very unlikely that the 

test could be normed in Spanish for a general area or region 

» 

of the country. 

The group also rejected the concept of the timed test. 

The time competition on the Lorge-Thorndike definitely puts 
the bilingual/bicultural child at a disadvantage. The timed 
test adds tension to the testing situation* and tends to 
give an imprecise picture of the examinees' abilities. It was 
also felt that the Lorge-Thorndike is too long and tends to 

p ‘ 

- „ ------ — » 



t 




22 

fatigue the examinees. 

The group was also concerned with the scoring proce- 
dures of the Lorge-Thorndike tests. They felt that the results 
are meaningless for bilingual/bicultural students. They . 
also seemed to think that there is little correlation between 
the test scores and actual classroom behavior. There seem 
to be many instances where the examinees do poorly on the 
test but are progressing fairly well in the classroom. It 
was also felt that any type of score on a test such as the 
Lorge-Thorndike is dangerous in that it tends to be misused 
by classroom teachers in labeling students and student 
potential , thus creating a particular bias in teacher 
attitudes toward students. 

Some comments made by individual evaluators in the 
group are interesting and valid: 

"The Lorge-Thorndike is basically a reading test— there- 
fore, if a child can't read, he is unable to take the test. 

The test is much too difficult for the bilingual child...". 

"I can see no positive values to the child taking the 
test. It Bhould not be used for tracking children." 

"This test is a reading test. Not only is this test 
ineffective for bilingual children, but for any child that 
cannot read." 

The group came to the following conclusions: The Lorge- 

Thorndike does not effectively evaluate the success or weak- 
ness of a bilingual program. The test does not measure the 
potential of bilingual/bicultural children, nor their I.Q., 

.-N 



23 



nor vhat they learn in bilingual education classes. The 
group felt that the test does not reveal to the teacher 
how she may improve her teaching, but allows her to label 
children based upon the test scores. It was the general 
consensus of the group that this test is of no positive 
value to the child, but, on the contrary, tends to be 
harmful. The majority of the group felt that modification 
is not the answer in dealing with the Lorge-Thorndike 
tests. • — — 

The group recommended unanimously that the Lorge-Thorndike 
tests be discontinued as an evaluative tool for bilingual/ 
bicultural populations. There were also several recommenda- 
tion!* that this test not be used under any circumstances for 
bilingual/bicultural children. 




40 



The Culture Fair Intelligence Test 

The Culture Fair Test is an intelligence teat which 
claims to measure intellectual capacity or potential; 
that is, it measures the child's capacity to learn in the 
future, rather than his already learned scholastic skills . 

The test is perceptual and nonverbal and thus claims to give 
fair predictions of future potential when used with children 
from diverse homes and cultural backgrounds. Experiential 
differences, i.e. opportunities in the years preceding the 
test have been shown to have a powerful effect on the outcome 
of most intelligence tests now in use, but the Culture Fair 
Test claims to have been used with equal success on children 
from various backgrounds. 

The Culture Fair Tests consists of three scales: Scale 

1 for ages 4 through 8; Scale 2 for ages 8 through 14, (grades 
3 through 9); and Scale 3, designed to discriminate in the 
upper ranges of intelligence for young adults and adults. 

Scale 2 consists of two parallel forms, A and B, each totaling 
46 items arranged in four subtests and covering 12 and 1/2 
minutes of test time. The tvo forms permit a rest pause half 
way in the testing, and interruption of the test for completion 
on another day, or an I.Q. assessment based on a single form 
if time is short. 

The Culture Fair Test has four subtests in each of the 
two forms. The content is figural and geometric, and each 
subtest involves a different kind of test question: Series, 



Classification* Matrices* Conditions* (topology). It is 
possible to use the sane form or forms for retest or for 
testing at yearly intervals. 

The group that evaluated the Culture Fair Intelligence Test 
was primarily concerned with the fact that this test is not 
a valid measure of I.Q. for bilingual/bicultural children. 

It was felt that the Culture Fair provides some measure 
of abstract reasoning and of spatial perception* and as such* 
the test does have some validity. However* the group was 
critical of various aspects of the test in its present form. 

First* the group was concerned with the English voca- 
bulary used in the directions of the test. It was felt that 
the vocabulary is very difficult for bilingual/bicultural 
children. The directions are lengthy and ambiguous* and 
certainly not appropriate for the age groups being tested. 

The directions have been translated into standard Spanish* but 
the group found the translation to be unsatisfactory. Because 
the test is translated into Standard Spanish* no consideration 
is given to regional differences in the language. The Spanish 
used here is very sophisticated and literary* thus possibly 
inappropriate for use with some bilingual/bicultural children 
in the U.S. "The words used in the Spanish directions are 
terms that not many Mexican-American children can understand." 

Because the test is nonverbal* and is composed of various 
geometric figures* it was felt that the quality of the lay- 
out design and the illustrations is of special importance. 

The group was critical of the lay-out of the test. The items 



26 

are very crowded ..." there is an overwhelming amount of 
clutter...". It was decided that the test definitely needs 
a new lay-out, in which items are better spaced on the pages. 
There was some concern as to the appeal of these geometric 
figures for small children. . .Are the figures interesting to 
the children, or are they frustrating? 

This test is designed to be culture-free, or culture- 
fair, and thus, fair in its measurement of the I.Q. of all 
children. However, the evaluators were concerned with one 
thing which they felt made the test culturally biased. The 
test is timed, and makes competition an important factor in 
taking this test. The evaluators referred to the time-competi- 
tion factor. as the "American Approach". It was felt that 
this factor is often culturally alien to bilingual/bicultural 
children, that it creates tension and frustration, and 
handicaps them in the testing situation. The group mentioned, 
too, that the time allotted was definitely inappropriate for 
those taking the test. Distinguishing between geometric 
figures is a very precise and demanding activity and demands 
more time for the testees than is allotted. The group also 
concluded that the length of the test is definitely fatiguing 
to the children considering the types of activity demanded here. 

In conclusion, the evaluators decided that the Culture 
Fair Intelligence Test in its present form, and with its 
present purpose, does not effectively evaluate either the 
strengths and weaknesses of a bilingual program, or the I.Q. 
of bilingual/bicultural children. It was felt that it cannot 



effectively evaluate what a child has learned in bilingual 
education classes unless his curriculum has focused on spatial 
perception and abstract thinking. "If this type of test is 
to be administered to children, then they should be taught 
perceptive and abstract reasoning concepts in the classroom..." 

It was felt that the test is positive in that it may 
reveal to a teacher how she can improve her teaching, in 
terms of focusing on perceptual concepts and abstract reason- 
ing. The group seemed to feel that the test may have positive 
value for the children as well if it is used correctly. What 
looms as a negative factor for the children is that time 
competition factor of the Culture Fair test. In general, it 
was felt that this test can and should be modified, by chang- 
ing the focus of the test from I.Q. to diagnosis. "It can 
be used as a diagnostic tool in order to help teachers find 
areas of weakness, rather them as an instrument to stratify 
some children." 

The group recommended that the Culture Fair Intelligence 
Test be discontinued as a measure of the I.Q. for bilingual/ 
bicultural children. They recommended that with modification 
this test could be used for diagnostic purposes, and as a 
measure of spatial perception and abstract reasoning with 
individual children or small groups. "This insturment should 
not be labelled as an I.Q. test, but as a test of perception 
and abstract reasoning. This instrument could have great 
implications for the development of curriculum <and teacher 
training, and as a diagnostic tool." 



Michigan Oral Language Proactive Test — Structured Response . 

The Michigan Oral Language Productive Test is based uon 
upon the Dade County Test of Language Development — the 
original test has been revised and enlarged to 43 items. 

The purpose of the test is to assess a child's ability to 
produce standard grammatical and phonological features when he 
speaks English. 

The method used in administering this oral language test 
is the following: The child is shown three pictures which 
form a story. He is given a Stimulus concerning one of 
the pictures. The stimulus is structured so that the child 
will give a response containing a particular feature of grammar 
or pronunciation. For example: 

Question 5 — Stimulus 

Past Participle 

Stimulus — (point to boy in picture) (Child's name) . 



It is stressed that the standard stimulus (given in the test 
manual) always be given as it is written, in order that there 
be a cue that evokes the desired response. It is also very 
important when using the Michigan Oral Language Productive 
Test to set the child at ease before beginning the test (pro- 
vide a verbal warm-up period ) , and to praise the child when 
he speaks, with moderately positive comments such as fine 
or You're giving me lots of answers. 



Ask the boy if he always 
goes to this river to fish 

Have you always. . . . 



• • e t 



28 




29 




There are 43 items on the test, which should take approxi- 
mately 15 minutes to give. These items represent 11 categories 
of grammatical and phonological features: uses of be; uues 

of have, comparison, uses of do, double negative, past tense, 
past participle, plural, possessive, pronunciation, subject- 
verb agreement. The scoring sheet provides for various alter- 
natives to the standard desired responses. From this sheet, 
category percentages for the eleven categories can be determined. 

It is stressed that the value of the structured response 
test is its ability to give the teacher a quick overview of 
her student's language needs. The more efficient the curricu- 
lum is in meeting the students' language needs, the more 
quickly the overview is likely to change. 

The Michigan Oral Language Productive Test was evaluated 
briefly, and the group seemed to find some value in using the 
test with bilingual/bicultural children, although it does not 
evaluate the potential or the I.Q. of a bilingual/bicultural 
child. It was felt that the test does effectively evaluate 
what a child has learned in the English or ESL component of a 
bilingual education program. Similarly, the test does point 
out the strengths and weaknesses of the English language or 
ESL curriculum of a bilingual program, by pointing out the 
particular language strengths and needs of the children. And, 
too, the Michigan Oral Language Productive Test can show a teach- 
er where she needs to improve her language teaching, although 
it does not reveal exactly how she may improve it. 

In dealing with the question of the positive or negative 



value of the test for the bilingual/bicultural child, the 
group placed responsibility with the test administrator, 
the teacher. Most of the group felt that taking the test 
can be a threatening experience for the child if the adminis- 
trator is unable to put the child at ease and make him feel 
that his responses are successful . It was felt that the 
test can have positive value if the child is skillfully 
praised for his responses and can take the test in a comfor- 
table and relaxed environment. 

It was suggested that the Michigan Oral Productive Test 
can be successfully modified to meet more of the evaluative 
needs of bilingual education programs. One suggestion was 
to change the stimuli, or the order of the stimuli in order 
to bring the test closer to actual classroom curriculum. 
Another suggestion was to make the test more bicultural by 
making the test pictures relevant to the culture of t.he 
examinees, rather than settling for "Anglo prototype" 
pictures. A final concern was that this test "needs guide- 
lines to determine what sequence learning should take.... 
the test assumes teacher objectivity and ability to improvise 
beyond the capabilities of some teachers." 

The majority of the group recommended that the test 
be continued for use with bilingual/bicultural populations, 
or for individual diagnostic purposes on special children. 
Most of the members had reservations about complete and open 
endorsement of the Michigan Oral Language Test! "until 
something better is developed" ; "only with modification; 



"to help develop ESL lessons for a particular class only 
"for testing ESL only"; "to measure the extent to which 
child speaks English only”. 



The Peabody Picture Vocabulary Test 

The Peabcdy Picture Vocabulary Test is designed to give 
an estimate of a child's verbal intelligence, by measuring 
his hearing vocabulary. The test consists of 150 plates 
preceded by three example plates. The examiner asks the exami- 
nee to identify various vocabulary items by pointing to the 
picture on each plate that best tells the meaning of each 
item. The test is untimed, but takes usually 10 to 15 
minutes. 

There are two forms to the test, A and B, each consisting 
of 150 plates. The plates are arranged in empirically-deter- 
mined order of difficulty. A fairly even number of plates 
are placed at each age level with a somewhat heavy concentration 
at the pre-school levels. The four vocabulary items used to 
make up each plate were selected based upon the following cri- 
teria: all four words were found to be at the same difficulty 

level; all four words demonstrated good linear growth curves; 
words were used where no sex differences were found to exist; 
primarily singular and collective nouns, some gerunds, and a 
few adjectives and adverbs were used; words were omitted which 
seemed to be biased culturally, regionally, and racially, 
as were dated words, plurals, double words, scientific terms, 
etc. 

The illustrations were selected based upon the following 
criteria: equal size, intensity, and appeal,* and appro- 

priateness to the age level of the subjects most likely to 



48*2 



view the plate. 

Besides being effective with average subjects , the PPVT 
has special value with certain other groups of subjects. 

Since subjects are not required to read, the test is 
used with non-readers and remedial reading subjects. Because 
the responses are non-oral, the test is appropriate for 
children with speech impediments, and for certain autistic 
or withdrawn children. The test has also been used with 
handicapped and perceptually impaired subjects. "The scale 
may be given to any English speaking resident of the United 
States between 2 years 6 months and 18 years who is able 
to hear words, see the drawings, and has the facility to 
indicate "yes" and "no” in a manner which communicates." 

(p. 25-manual) 

The group that evaluated the Peabody Picture Vocabulary 
Test felt that in general this test was one of the best instru- 
ments that is currently available for measuring the capabi- 
lities of bilingual/bicultural children. It was the consensus 
of the group that the basic structure of the test is good, 
that the directions are clear and appropriate for the children 
being tested, that the lay-out design is good, and that the 
untimed nature of the test is a positive quality. There 
were, however, several negative considerations that the 
group discussed in relation to the PPVT. 

First, it was felt that the illustrations used in the 
PPVT are very flat and not as appealing as they could be. The 
group suggested that the pictures should be in color, and 




34 

should be larger for use withthe very young examinees. The 
group found that some of the items were ambiguous , for example , 
item 22-which contains pictures of what appears to be a 
boat on a river, some vegetables, a rosebush and two mountains. 
Finally, and very importantly, it was felt that many of the 
illustrations have cultural references that are not easily 
identifiable to the bilingual/bicultural child (items 31, 44, 

57, 58, 62, 64, 27, 69, 70, 71). In regard to cultural 
implications the group seemed to feel that many of the 
items in the PPVT are not fair to bilingual/bicultural children. 
Few of the items are reflective of the cultures of bilingual 
children. It was felt that as the test progresses, the items 
definitely become more culturally complicated, and move further 
away from what the bittigual/bicultural child can relate to 
culturally. For example, item 64 contains the pictures of a 
fencer, prim lady with pencil and paper, an old woman giving 
a speech at a podium and a chef standing at the stove, which 
are all strange items for some bilingual/bicultural children. 

It is interesting to note that this test was normed on a 
population of white Anglo children living around Nashville, 
Tennessee. 

The group was also concerned with the vocabulary used 
on the PPVT. Not only are many of the pictures on the test 
oriented toward anglo culture, but the desired vocabulary 
responses also reflect this orientation. The group seemed 
to feel that the first ten items and vocabulary responses are 
applicable to the bicultural/bilingual child, but that after 



item ten, the vocabulary becomes increasing more difficult 
and more unrelated to the child's language experiences. They 
cite as an example, item 22, which asks for the response 
"bush” • Item 67 asks for the response "stadium". Item 
70 asks for "stunt". Item 71 asks for the word "meringue", 
and 72 asks for "appliance". The group proposed an evalua- 
tive study of words that are familiar and relevant to the 
bilingual/bicultural child. 

In discussing the possibilities of translating this 
test into Spanish, the problem of regionalism was of great 
concern. Spanish vocabulary items definitely vary depending 
upon the region of the United States. For example, the group 
cited several possibilities in Spanish for the word truck ; 
troca, camioneta, camion; for the word car? carro, auto, 
automovil; for the word baby ? babito, nino nene; for the 
word teacher ; maestra, profesora. The group concluded that 
an extensive study of the regional differences in Spanish 
should be made. It was felt that after such a study, 
regional differences should be taken into account in the PPVT, 
and that these different forms should be included in the test 
and be accepted as correct responses in each particular region. 

In conclusion, the group decided that the PPVT does not, 
in its present form, effectively evaluate the success and 
weakness of a bilingual program, the potential of bilingual/ 
bicultural children, nor what these children learn in bilingual 
education classes. It was felt that the relative value of this 
test for the examinee depends very much on how the results are 



36 

used. It was stressed that this test should not be used 
to measure I.Q. , but more as a measure of vocabulary 
comprehension and growth. 

The evaluation group for the PPVT recommended that the 
test be modified. They felt that the basic structure of the 
test was a good one. An attempt to modify the test should 
concentrate both on changing the anglo cultural orientation 
of the illustrations and the vocabulary and on the problems 
of regionalism in Spanish vocabulary. "The PPVT has good 
possibilities for development as an evaluative tool for 
bilingual programs. The test, in a modified form, could be 
used as a means of measuring the overall success of a 
bilingual program. It could also be used to measure the 
progress of children in a bilingual education program over 
a year's time." 



P V 

\J « J 



BABEL TESTING AND ASSESSMENT WORKSHOP 



CRITIQUE GUIDELINES 



I. VOCABULARY 

a* Is the content appropriate? 
i.e., do the words used 
adequately reflect those of 
the age group tested? 

b. Degree of difficulty. Are the 
words used too advanced or 
too easy for the test level? 

c. Visual presentation# position- 
ing, Are the words arranged 
in an easy to read fashion? 

d. Other 



II, ILLUSTRATIONS 

a. Are they ambiguous? i.e,# can 
you tell easily what each draw- 
ing is supposed to be? 

b. Are the pictures of good quality? 
i,e, appealing to children? 



c. Cultural implications, do they 
depict items naturally and 
easily identifiable with 
Chicano or Asian cultures? 

d. Other 



III. DIRECTIONS 

a. Are they clear? 

b. Are the words used to instruct 
the children appropriate for 
their age? 

c. Are they very lengthy so that the 
point becomes unclear? 

d. Other 



38 



CRITIQUE GUIDE - PAGE TWO 



IV. LAY-OUT DESIGN 

a. Position of items - are they 
items placed so that they 
bias other items? Are 

they positioned sequentially 
or randomly? 

b. Visual Effect - is the overall 
impact an appealing one? Are 
they spaced far enough apart 
or are the items crowded? 

c. Does one part of the test 
distract from another? 

d. other? 



V. CULTURAL IMPLICATIONS 





a. Are the items reflective of 
bilingual cultures? 

b. Can the illustrations and 
vocabulary be generalized to 
other cultures? 

c. Are the items "fair" to 
children who are bilingual/ 
bicultural? 

d . other? 



VI. TRANSLATIONS 



a. Are they correct? 

b. Is the vocabulary used appro- 
priate for children? 

c. Are regional differences 
in language a factor? 



r: 



d. 



Other? 



* 



/ » 



• .* / 

. --. j ■ 

s /.., ./ ■•* 
/ ; : ‘ 

/ .- 

/ ' / • 



• / * 

// 



■i — v_ 



VII. TIMED TESTS 



■\ . CRITIQUE GUIDE PG. THREE 



/ 



a. How significant is the 
competitive factor? 

• . 

b. Is the time allowed 
appropriate for children? 



‘ 3 



c. Other? 

: { '/ 



VIII. SCORING PROCEDURES 

a. ' Are the results meaningful? 

. / • n • ■ y 

/ b.. Are the results clear? 






, c. Do the scores/results help the / 

teacher to' understand and help / 
her students? / / " ; {_ 



d. Other? 

IX. OTHER CONSIDERATIONS 

. a. . Length of test by subsection 
and total* is it, fatiguing to 



children? 



//- •... 



b. What population was the test 
normed on? 

c. How large was the norming 
■ population?/ 

-d. — D oe s— the-tast-appaar to be 
used the way in which it was 
intended by the author? 

\ *tor* 

e. Other? , \ 

\ • 




i 




CRITIQUE GUIDE - PAGE FOUR 



X.' SUMMARY 

" • a. Does this test effectively 

evaluate the success of a 
, bilingual program? _ 

b. Does the test effectively 

evaluate the potential of 
bilingual/bicultural children? 

c. Does the test effectively evalu- 

ate what a child has learned in 
bilingual education classes? 



d. Does the test effectively 
y evaluate a bilingual child's 

/ l ^1.Q,? ... a 



ei Does the test effectively 
evaluate the weaknesses of a 
bilingual program? 

' s S. __ 

f. Does the test reveal to the 
teacher how she may improve 
her teaching? 




Is taking this %fest of 
.positive value to the child? 



h. Is taking this test: of negative 
value or harmful to the child? 



i. Other? 




« 





c • 



. , * 



V 



■ » ; 






■-/ 



\ 

-- . N 

: • x 



/ 

/ , 



•X s 



/■ X. •' _/ 

BABEL TESTING & ASSESSMENT WORKSHOP 
POSITION STATEMENT 



\ 



1/ have reviewed the 



° . test 

(please^note level & form) 



in terns of its appropriateness. for-Use in evalaa ting bilingual/ 
bicultural children and bilingual/bicultural programs. 



\ / ! ' 

I endorse its continued use for bilingual populations. 

\ 

_I endorse its continued use only for individual 



diagnostic purposes, on special children .with certain 

learning difficulties. ■ ■/. / ' 

•'S ' / . ; 

_I cannot give an opinion on this instrument 
(explanation attached) . . 

I urge that' this instrument be discontinued as an 



evaluative tool for bilingual/bicultural populations. 
_This test should not be used under any circumstances 
for bilingual/bicultural children. 



NAME 






POSITION 



PROJECT/DISTRICT 



DATE 



o 

ERIC 



\v 



41 



I* r\ 

■ j\; 
O 



\ ' 

RESOLUTIONS ■ , ' , . 

(Drafted January 28 , 1972) 

1. Testing of children whose language is other than \ 
standard English with instruments that were developed 
for the user of standard English violates the norms and 
standardization of those instruments and therefore 



raises serious questions as to thh results obtained, ‘we, 
therefore, take the p ositioi wfchat^such users^of these instru- 
ments with children whose language is other than standard 



English is invalid. 

2 . Sufficientievidehce now exists to direct us to the 
development of Criterion Referenced Assessment systems as 

a means of improving educational programs accountability for 
learning activities. It is imperativex that these evaluation 
processes be correlated with local performance objectives. 

3. The development of valid test' instruments for bilingual/ 
and/or biicutral children must be directed by bilingual and/or 
bicultural qualified personnel in the educatipnrffield or simi- 
lar fieidsj otherwise^th^-test-^nstruments^ will not reflect 
the^parficular values, skills, etc. of the ethnic or cultural 

t ' 

group being tested. 

* 4. 'Whereas currently used standardized tests do not 

measure the -potential and ability of California bilingual or 



bicultural children, and whereas these tests are being used 
if they do so measure, and ^they are relied upon to counsel, 
place and track these children, this body hereby resolves 

V ' ■ • 

that such use/ of standardized tests should be immediately 



discontinued; 



\ 



•v /. 



42 



5S 



/' 





\ 

■ i-’ 

PART II 

i \ 

\ 




THE MEW IHTER-AMERICAN SERIES 







•• ' • ° . • ‘ - \ 

Preparedly Barbara Havas sy , Ph.D., Consultant 

For Multilingual Assessment Program 

(joe R. Ulibarri - Project Director) V 

. , * . \ 

1 , . •. ' \ 





The project presented or reported herein was performed 
pursuant y to a Grant from the U.S. Office of Education, 
Department of Health, Education, aqd Welfare, However, 
the opinions expressed hereln do not necessarily reflect 
the poaltlon or policy^ of the U,S. Office of Education, \ 
and no official endorsement by the U*S. Office of Educa- 
tion should beinf erred * j 

f* •- * ■ * ' ' . ■ ! 



Havas ay 



The New In ter-Amarlcsn Series: 

Tests of General Ability & Tests of .Reading, 



Author: 


Herschel T. Manuel « 


/ 




'/ ., 


/ 

/ 


Publisher: 


Guidance Testing Associates 


/ ' N 



Author's Purpose: The Series consists of two types of tests: tests of 

general ability and tests of reading. 

(1) Tests of General Ability ; .designed to pro- 
vide an estimate of the ability to do academic 
work in general... The verbal materials test 
the understanding of written language and the, 
ability to recognize relationships among concepts 
expressed by words. * Jhe nonverbal materials also 
present problems of * relationship among concepts, 
but jin these exercises the problems are expressed 
by pictures or drawings with only initial verbal 
directions. In the numerical materials, the 
ability to think quantitatively is tested by ex- 
ercises in arithmetic' computation and by axerclses 
in arithmetic reasoning.. • The tests provide an 
estimate of abilities which cut across different 
. fields of study. " Not Intended as a measure of 
\ general intelligence. 

\ (2) Tests of Reading ; These tests not only measure 

■ \ achievement in reading, but form a .basis "for es- 

timating ability to do school work In other areas 
in which the ability to read is related' to achieve- 
ment." 



Description of Series: All tests at all levels available in English or 

Spanish editions. 

A. Tests of General Ability 



1 . . - Pre-School Level . » Individually administered in 2 periods with 
picture stimulus cards. Requires no oral response. Verbal- 
numerical: 40 items, non-verbal: 40 items. 

2. Level 1 Pretest . Grade. K-l. 4 page practice test to prepare 
for actual Level 1 test. 



3. Level 1 . Grades K-l. 80 Items. A 'readiness' test. Adminis- 
tration In small groups (8-12 children) recommended. Consists of 

1. Verbal -Numerical subtest, 40 Items composed of oral 1 
vocabulary (part 1) and number (part 2) Items. / 




Non-Verbal subtest, 40 Items; composed of association 
(part 3) and classification (part 4) Items. 

*• ! 



2 



i 






■V 









Havas ay 



?V 



45 



VI 



?! 



i • '' 



iV 



\. • 



ERLC 

kfin^iiiTi^iTbaa 



v':’ 



y;.*' ; 



.5? 



4. Level 1 , Abbreviated edition. Grades K-l. 64 items (fewer 
items in each area than the long form). 

5. Level. Grades 2-3. 100 items. Consists of: I 

/ 

1. Verbal -Numerical subtest, 60 items. I 

A A - ' ' 1 : 

v K /- ; 2. Non-Verbal subtest (classification and analogies), 

40 items. ' \/ 

*V \ \ 

6. Level 3 . Grades 4-6. 150 items, 52 min. Consists of* 

1. Verbal subtest, 50 items. Composed of sen tence^ com- 
pletion (part 1) and word relations (part 4) l^jnin. 



/ 



# V 

2. Non-Verbal subtest, 50 items. Composed of figure an- 
alogies (part 2) and figure classification (part 5) 

16 inin. / 

v 

3. Numerical subtest, 50 items. Composed of computation 

(part 3) and number series (part 6). 18 min. 

7. Level 4 . Grades 7-9. 150 items, 52 min. Sacue format as Level 3. 

8. Level 5-Advanced . Grades 10-13. 150 items, 52 min. Same format 
as Level 3. 

>« Heading Tests ' ' 

1. Level 1 . Grade 1. 80 items, 18 min. 

1. Part 1 Vocabulary 40 items, 8 min. 

* 2. Part 2 Comprehension 40 items, 10 min. 

2* Level 2 . Grades 2^5-3. 0. 110 items, 23 min. 

1. Part 1 Level of Comprehension 40 items, 10 min. 

• ' 

2. Part 2 Speed of Comprehension 30 items, 5 min. 

3. Part 3 Vocabulary 40 items, 8 ! mln. 

3. Level 3 . Grades 4-6*- 125 items, 41 min. __ 

1. Part 1 Vocabulary 45 items, 10 min. 

2. Part 2 Speed of Comprehension, 30 items, 6 min. 

< ' 3. Part 3 Level of Comprehension, 50 itsms, 25 min. 

^ 4. Lsvcl 4 . Grades 7-9. 125 items, 41 min. Same format as 

Level 3. 

5. LcvelA Grades 10-13. 125 items, 41 min. Same format _as.„ 

Level 3. < 






f: n 



46 



Format: 







" Havasty 



\ 



Tests of General Ability: levels l through" 2 consist of pictorial 

\ . \ 

Items where the child marks In the test book the picture which Is his 

answer to the question. For certain Items (oral vocabulary and num- 
erical) the teacher reads the question* For other Items (classification 
and association) the question Is, implied by the pictorial representation. 
Levels 3 through 5 Involve printed questions In a test booklet with/ the 
answers to be marked on a separate answer sheet. 



Where Used: According to the most recent records, the tests of this series 
are being used at 24 Title VII Spanish bilingual program /sites. 
These are: | 



Compton, California 
Healdsburg, California 
Ollvehurst, California 
x > Redwood City, California 
Salinas, California 
Denver, Colorado 
^Naples', Florida 
Chicago, Illinois 
Boston, Massachusetts . 
Springfield, -Massachusetts 
Albuquerque, NewTMexlco 
Las Cruces, Mew Mexico 
Hew York City, New York 
. Rochester, New York 
Abernathy, Texas 
Austin', Texas 
Del Rio, Texas 
Houston, Texas " 

I»a Joys, Texas 
Laredo, Texas 
^McAllen, Texas 
San Antonio, Texas 
Zapata, Texas / 

Milwaukee, Wisconsin 



Technical Data 



Development of Spanish and English Editions of the In ter- American Series 1, 



The description to follow Is a summary of the test author's claims about the • 
development and Intent of the Series. It does not reflect the reviewer's 
judgement about the test development, motivation for the Series^ or content 
of the Items* ^In a later section the reviewer will take Issue «rl£h some of the 
claims made by the test author about the Series* 



o~ 



\ 



\ . 



\ 



■ ' ' \ 



* s 



T"' 



1 



\ 



HAvniy 

/ ‘ ' '■ ’ * . 

. / . i . .• •; 

. The uniqueness of this series of Cites stems fro® the feet Chat Chare 



47 



\ 



are so-called parallel forms, one in English and one In Spanish, yielding 

\ e j ■ ^ * 

comparable scores. The entire Series of cases grew out of a study of 



teaching English In Puerto Rico In the 1940 f s. Both language editions of 



the tests were originally developed In Puerto Rico, after the notion of 



• \ 



translating English tests Into Spanish was rejected. According to the 



author(s) an attje.pt was made to develop parallel Cons by bringing together 

* \ i 1 ' • ‘ ! f 



\ 



native Spanish- and native Engllsh-speakers to construct the tests. The l 

\ » \ ■ ■ . I 

objective of this procedure was to select test Items comm to the two culf 



turea and of similar difficulty. The tests were first published In 1950. 
The Inter-American Series to be examined here Is the most recent version. 
Part of It was published In; 1962 and part In 1966. 

The goal In construction of the test pool Items for both language 

1 ■ \ 

editions was Co create items with the. following characteristics: 

• T j * ‘ • ' 

1. Items common to, but not necessarily of the same frequency, 
"" the cultures Lf the Spanish-speaking and English-speaking 



peoples of the Western Hemisphere. 



2. use of the same pictures, drawings and numbers In the non-. 



3. 



language parts of the test booklets.. 

use of the same directions and same verbal content, expressed 



for one edition In standard Ei^fellsh and for the other In 

!■ * 

standard Spanish of similar difficulty. The test developers 

- I CsSs**”* 



claim that* the Standard Spanish and English avoid local Idioms 



, as much as possible and that the teats have been designed for 
use without significant change wherever they may be administered. 
Items from the Item pool selected forinclusion In the test were chosen 



In the following way. Spanish Items were administered to Spanish-speaking 

! ' it 

children and English Items t^ English-speaking^ children. The relative dlffl- 






:w 

/' 



V 



.. t • 



/ v 




/ 






y ^ . 



• . 

L X. /' 



j * 

- x • . . i./‘ 

;/ * ‘ 



t . ' 



'■ . '.■ 






48 



Havassy 



culty of each' item n was^ examined and its discrimination. between the more and, 









less able groups, as. determined by total test scores, was noted. Only those 






items which discriminated between the more and. less able groups in both 



language groups and which confonaed to the previously mentioned specifica- 



tions were selected for the published edition* 



With the history of the development of the Series: in mind, norms, re- 
liability and validity and o\her technical aspects of the tests can be exr 
arolned* - ' 

No rm s 

“T ^ .. . 

The test author and publisher rake a unique position with respect to 

• . , ■ ■ \ ^ 

normative data on the Series* They recommend that the tests be used with 



regional or local* norms (as contrasted with national norms) "to be prepared 



by those who use the tests," With resbecV^o the original sample on whom 
the Series was developed, there is little information beyond the fact that 
it contained English and Spanish-speaking children, presumably in Puerto 
Rico. 0 

Vfljat the author and publisher do provide are: (1) some norms, presented 

u * ■ . » ■ ■ 

incidentally, based on data provided by some test users; (2) some estimations 
of norms based on calibration of the Series with other. standardized tests . ~ 
with published norms (equivalent scores method) also provided by test users; 
(3) detailed instructions with respect' to developing local norms and to • 
calibrating the Series with other tests* Some of the. tests which have been 
calibrated with the Series ^ “ * 



Tests of General Ability, Level V 1 with Goodenough-Harrls Draw-A-Man 



Tests of General Ability and Tests of Reading, Level 5, English, 
Form CE with some Project Talent tests 



& % r 



Tests of General Ability, Level 5 with one administration of the 
College Board Scholastic Aptitude Test at University of Texas 



Havaeey 



Ketlmatlon of ,i aatiOD•l ,, ooru for Laval* 3, 4, 5 of both ability 
and reading taat* through calibration with varloue Kducatlonal 
T**tlng Service Teete. •___ ' 

Some of^he_Sp4gi«h*ditioo ofthe Sari** with a tut d< 
by the Puerto Rico Department of Iducatlon. 

Parcantlla acoraa yielded by various- leveie of the^teet in some Span 1 ah 

its- 




apaaklng countries (a.g. Mefclco, Panama, Vanatuala, Coata Rlca, Chlla) ara 

° * # - . .* • 

alee provided. * ' * * \* 

, \ • ' ■ . sv \ 

In examining tha eectlon of tha manual which deal a with -th* normative 

data, It b* com* a clear that there la a great deal of epace devoted to norma- 
tive performance. However, It la Impoaalbla to aumuriu or conclude any- 
thing from the datft for^they^arl^trrno way 'systematic. They ar* merely the 
’ \ , - t . 
.performance of varlou* group* , from varloue pert* of the world, on various 

teat a of the Inter-American Strict. Furthermore,' ther* a re~ al eo-co,evalua- 



/ 






* calibrated. From a practical^ 



tiona of the taata'.vlth which the Scries 

point of view^ little of the date provider a potential ueer with any helpful 
^Information. An Individual teat user should be prepjarad to construct hie 



own norms. 
Reliability 



/ 



/ 



The Indlcee of reliability provided In^ha manual are baaed on admlj 
let rat lone of the two/forma of the teat (Cl and DS) to the earn* group* of 
children after, a "relatively ahort Interval." Father than reproduce the 

* . • . ' - - - V : ■ • . ■ ' 

; pages of tabl*e illustrating the reliability coefficient* of th* varloue 

' \ - - ""T”* 

levels of the teat, these coefficients^ will be 

1 of th*ae coefficient*. /-* 



rlred by noting the range 



\ 



Teat* of General Ability' 
- s Englieh Edition 



Level 1 
Level 2 
Level 3 
Level 4 
Level 5 



' \ . I- 

0.57 to 0.89 
0*53 to 0.82 
"0,67 to 0,90 
■fM to 0.82 



not novum. 



■ I 









t . 



/. 






50 



-/:• 



N ; . 



./ 

/, 




N ‘ 
A 






V 



Level 1 



Level 2 
' Level^ 3 



Level 4 
Level 5 



Level 1 
Level 2 



/ Level 3 

/ s* 

Level 4 



/ ; 



Level 5 



Level 1 
Level 2 
Level 3 
Level 4 
Level 5 



Validity 



-A 






Ha vasty 



Tests of General Ability 
Spanish Edition 



0.45 to 0.89 
0.49 to 0.80 
0.74 to 0.83 
0.74 to 0.88 
0.76 to 0.90 



— Tests of Reading 
Spanish Edition; 



0.79 to 0.86 

* 

0.42 to 1). 74 
0.64 to 0.90 
0.65 to 0.87 
0.48 to 0.82 A 



Tests. of Reading 
English Edition 



0.84 to 0.95 
0.65 to 0.90 
0.78 to 0.95 
0.72 to 0.£1 
0.74 to 0.93 



■ / 



i v. 



There Is no direct presentation or examination of the /validity of any 

> ‘ 4 f ...... # ■ " / t 

aspect of the Inter-American Seriesin the published aanuals. One Mist ' 

* • _ / 
infer the answer to the queatlorTof valid! tyof the* neat ares from material 









,# 

•y.y 

' .vvt 

V 

M 

; M * 



I 



m 



I 

if 

I 

i.' 

fir 



> X 



9 



presented as correlations of the Series with other tests. Though not pro- 
jsente d as material fro. which to Infer validity, the correlational Material 



\ 



/ 




A 



4 r ~ r 



Bavaaay 



51 



V 



it uulvt (it co«prisas~22 pages in tht technical annual). If it it not 
intended that validity be inferred frog this material, then it auit be 
said that the author and publishers, of thia Seriea of teata have preaented 
absolutely no consideration of the validity of their inatrunent. A list 
of the teata with which certain levels of the Series have been correlated 
follows: 

% e * • * - ' «■ 

1. Goodenough-Harrla Draw-A-Man (with Engllah edition only) 



2. Metropolitan Rudlntii Test 

3. Otla .Quick-Scoring Mental Ability 
Teat, Alpha 

■ ' . v.; ■ 

4. School and Collage Ability Teata 
(SCAT) 

5* Differential Aptitude Teata > 

6. Metropolitan Reading Teat 



(with Engllah edition only) 

• ' ' 



( 



( "„ 

-Cr" 






7. Stanford Achievement Teata, Primary 
II, Reading . ( " 



8. California Mental Maturity Teat 



( " 



9. California Achlevenant Teata Sc* \... 
'^7' quential Teatte of Educational - 
/ Progreaa (STEP) ■ . ( " 

10, §TEP Reading . ( " 

11. Project Talent teata ( " 

112 . College Board Scholaa tic Aptitude 
Teat ‘(Engllah and Spanlah) 

/ l. ' ■ " 

13. Iowa Teata of Baalc Skllla ■ 

:: _ V. v ' 

14. Metropolitan Achievement Teata 

15. SRA Achievement Teata- 

16. Iona Teat of Educational' Develop- 



(with Engllah and Spanlah edit .ooa) 
(with Engllah edition only) 

( " « ii n c \ , 

* ( * 

( '! w ii it j 



) 



' The teata of General Ability and Teata of leading have alao been correlated.. 



with each other ami with teoahera' 



P 



V8 




52 / Havaasy 



The Inadequacy and Irresponsibility of thla attempt at teat validation 
cannot be over-emphasised* There la no attempt to generate a systematic 
defense of the validity of the Series or Its psychometric structure* There 
Is not even a reference made to the theoretical underplnnlngs^of the Series, 
l,e, why certain questions were thought to be Indicative of general or reading 
ability* The correlations with other tests (as listed above), do not fulfill 
any criteria for validity* 

Furthermore, though the < list of tests with which the Series has been 
correlated Is lengthy It provides almost no Information as It Is not a 
systematic correlational procedure* The correlations are based on^data from 
different levels of the Series collected on highly varied samples, located 
In different geographical areas* Sometimes the other test Is correlated with 
several levels of the Series, sometimes with only one level of the Series* 
thus the data do not stand as a coherent entity* Finally, to Infer validity 
of one test from Its correlation with a second tear Implies the validity of 

^ " c ' 

m 

the second test* And, as the test developers of the Series do not provide 
any information with respect to the validities of the other tests, the pro- 
vided correlations are meaningless* That validity la not a requisite for . 
published tests Is a well-known fact* It lends additional support to the 
contention that the correlations of the Series with other standardised tests 
are a meaningless gesture* 

Evaluations - _ 

A* The Center for the Study of Evaluation (UCLA), on a three-step con- 

*• .... 

tlnuua fro* good to poor, r*ted several levels of the Series for use with’ 
first, third, fifth, and sixth grades In the following way: 

; . . 

■ . : ' es ... . 



m 



i . /. 
\ 









a. 

'I 



Teata of General Ability 



./ 






53 



\ ' s ■■ 




■i. / • 


Grade 


l 


, Grade 1 

Ns. 


Grad j 5 

/ 


Grad* 6 




Verb, 


Non* 


Non* 


Total 


Mum. 


Verb. 


Non* 


Hum* 


Verb. 


Von- 


, 4 IS . 


Mu®. 


Verb. 


Verb. 




/ 




Verb. 






Verb, 


Measurement Validity 


poor 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


Examinee Appropriateness 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


fair 


Administrative Usability 


fair 


good 


good 


fair 


falx 


fair 


fair 


falx 


fair 


fVir 


Normtd Technical Excellency 


poor 


fair 


fair 


poor 


fair 


poor 


fair 

V 


fall 


poor 


fair 


a 

7 


Tests of Re 


adlng 




/ 




/ * 




- 


. w 

\ * 





Grade 1 


Grad* 5 


Grade b 


. ■ — ' 


Vocabulary 


Vocabulary 


Vocabulary 


Measurement Validity 


fair ^ 


fair 

S 


fair 


Examinee Appropriateness 


fair 


fair 


fair ' 


Administrative Usability 


fair 


fair 


* fair 


Normed Technical Excellenci 


poor 


poor 


poor 



,7 



B. Suuaarles of review* fro® Tha Fourth Manf 1 Meaaureoenta Yearbook / 



(Buroa, 1953). 

Testa of General Ability 



\ 



Drake # Drake indicates general disapproval of the tests and ^dvises 
that they only be used with extreme caution* He feels the only justification 
for the tests, in light of the many atandardlaed tests of capacity, is its 
parallel Spanish and Kngllsh forms* Iven though the test is claimed to be 
culture-free, he questions this aspect of the -test* Drake venders if the tests 

are really culture-free and if both edltlona are equivalent in difficulty* ^ 

V ' * . . - * .' / •• ( 

If they art, than why, according to the provided aedlan acoraa, la tha ability 



V 

o \ 



l*t r\. 



54 Havassy 

of children of tha United States greater than tha child ran of Maxico which 
In ton Is greater than tha capacity of tha children of Puerto Rico? Draka 
questions thaaa results and further asks if one can aseume the sampling was 
equivalent In all three countries, If tha motivation of tha chlldran was 

equivalent In all thraa countries', at cetera. 

■ ' ■ ■ .4 ■ ■ 

Durost. This raviev la not vary halpful aa It refers to Infor- 

nation* not currently available to the public, information privately obtained 
by Durost from tha teat author and/or publisher. Information referred to In 

. < *N ’ * 

the review as graduate work being conducted at the University of Texas, or 

InforMtldn-whlch'la only In older versions of the test manuals. Further- / 

more, several of Durost’ s conments are unclear. For example. In dealing with 

the validity of the test Durost says: "From the point of view of validity, 

It seems clear that these tests are superior to 'tests currently available In 
d . • • ' . 

the United States for measuring mental ability for Spanlsh-Engllsh groups," 

i . ' 

One questions the validity data on which this statement Is based and, further, 
what a "Spaniah-Englith” group la. 

With respect to validity, Durost 9 * summary position Is that the validity 
data, l,e,, the correlations of the test with achievement tests. Indicates that 
they fall within the typical range of such values, and that they provide no* 
basis for thinking these tests are better than others, Rls criticism with re- 
spect to the tests norms Is that they are of little practical use. Concerning 
mechanical details, Durost Indicates that the art work of the test la not very 
good, that sometimes the Intent of the picture Is hard to determine, end that 
the separate^ answer sheet Is awkward. In concluding his review, Durost says 
that there la nothing about the Teats of General Ability that would cause one 
to use them In place of widely-used standardised IQ measures. However, he In- 
dicates that Its use with bilingual children at borders of English- and Spanish- 







Havassy 55 

fvk* 

spatting countries la certainly deslreable. Ha faala that the test represents 
the bast that la avallabla for use In Spanish- a peaking countries. Duroat ends 
his review with the hope that additional research will be conducted out on 
this teat* 

c 

Testa of Reading 

Orleans * Many of Orleans criticisms appear to be specific to the 
earlier version of the Tests of Reading. Nevertheless those remarks addressed 
to the validity of the tests and to the pictorial presentation appear to be 

‘ i 

still valid and will be discussed below. 

First, Orleans Is concerned with the validity of the tests. 'and the con- 
text In which they are presented. He questions the context of both the English 
and Spanish editions. More importantly, he questions whether a context appro- 
priate for measuring reading achievement' "in English of American children" Is* 
also" appropriate for measuring reading achievement In "Spanish of Spanish- 

speaking children^" Orleans notes that there Is no supporting evidence of 
/ . * 
content validity for either of both editions of the test, which further com- 

/ • * v» 

** • I X 

pounds the Issue of validity. '> \ 

As an example of the validity problem, 'Orleans cites an Item where a ^ 
picture of a woman washing clothes Is followed In the English edition by the 
words wash , wake , walk , and call and In the Spanish edition by the words lavar . 
deapertar , andar . and Linear . In the English edition, a child Is required to 
distinguish between three words beginning with the same letter, all having 
the same number of letters In the word, while the same Is not true for the 
Spanish translation. The effect of these circumstances on the validity (and 
reliability) of the test appears to have not been considered by the test 
authors and publishers. \ 

As to the quality of the pictorial presentation of the reading tests, 

Orleans comments that they are poor and confusing. 



o 

ERIC 






■ ■ 








/ .. 
fj 



:! 



56 



Havassy 



l \ westover. (reviewed the English "edition only) • With respect to 

jthe English edition, Westover finds fault with the following aspects of the 
Tests of Reading* First, he finds th*' illustrations and format of a poor 
quality* Second, he remarks that the vocabulary section gives the test user 



little Information regarding the pupils word-recognition skills* Third, he 



/ 



• finds the tests do not provide enough information as they' measure only two 



/ 



aspects of reading: vocabulary and comprehension (this latter problem appears 



/ 



to have been remedied in later editions of the test). He feels the tests 



requ: 



require the addition of some measure of reading speed* 



Irrespective of these faults, Westover feels the tests have face val- 



idity and that .the materials are intrinsically interesting* He feels the 



tests' specific value are when used in connection with the Spanish edition 



/ 



in order to compare performance* Otherwise, he feels older and established 
3 Off Tt 



tests o v reading have more to offer the test user especially as they provide 

/ , 

more adequate norms, data concerning reliability and diagnostic information* 



7 



j Relavence of Tests for Spanish-Speaking Populations 

/ There are several Issues which Bust be considered in evaluating the 
^ appropriateness of the Inter-American Series of tests for Spanish-speaking 
populations* Of the more timely of these are the following: the. value of 

the Series as an estimate of the ability or capacity of Spanish-speaking 

/ 

children; the value of the parallel forms with respect to culture-fairness 
or freedom, from cultural bias; and the issue concerning the determination of 
which language edition is appropriate for usage with Spanish-speaking children 
of the United States* * 

The first of these Issues concerns the accuracy of the estimate of 



ability, provided by Series test scores* The question concerns the accuracy 



9 of the Series as a measuring device* Information contained in other sections 






7 



\ 



VS 









&-V 



„ o 

ERIC 



1 

■ > 



Havas sy 



57 



/ 

! 



A 



\ 



v 

I 

<< 

■ •)■ 
■V 

' l ‘ 

.i 



of this review (Technical Data) indicates that the Series has some very 
serious" deficiencies. The Investigation of Its .technical properties, l.e., 
the reliability and validity gives the impression of being confused, 
sporadic and random and does not Impart the feeling that the Series Is 
either reliable or valid. This feeling la borne out by ratings received 
by the Series from the CSE evaluators. 

• Leaving these technical matters aside, the Series has some major short- 
comings on a much more basic level (which, or course, ultimately contribute 
to the Series 9 lack of reliability and validity). These concern the prac- 
tical aspects of the test such as language apd content, visual presentation 
and timing. 

With respect to the language , careful examination of both of the language 
editions' reveals the' following problems. First, the directions are unclear 
\and stilted and the word usage la awkward. The English directions contain 
such situations as the following. In the Test of General ..Ability Level 1 
Association section, the directions state: “Nov look at the hat In the next 

row. Put your finger on the hat. To which of the other pictures does the 
hat belong? 11 (Emphasis mine.) The Spanish directions. In what may be proper 
Puerto Rican Spanish are a poor choice of words from the point of view of 
Southwestern United Stages Spanish speakers. For example, the Instructions 
refer to flla. Some bilingual educators point out that cuadro, lines or cerro 
would be better. Also, Level 2 Analogies (Test of General Ability) the In- 
structions state 'Estos dos dlbujos son el primer par... 11 One bilingual 
teacher has suggested that the more appropriate Spanish phrase for the 
Southwest atleast, Is "Bstos dos dlbjuoe estan an paras." 

The language of the stimulus materials la Also of a troublesome nature. 
The words are a poor sampling of words In c nm m rm usage and the choice appears 

i 




have the greatest probability of being exposed* For example. Test of General 
\ , 1 * 
Ability , Level 1, fora CK, #17 "•••find the warrior* 11 "...busqutn el guerrero," 

Also, Teat of General Ability, Level 2, Cl, #22, "•••the picture which wakes 
you think of refuge«" "•••del dlbujo que lee haga penear enrefuglo." 

When exawlnlng the content of the teet Itewa (although not completely 
Independent of the language in which they are expressed), one again finds this 
problem of situations which lack words of comoo usage In addition to ones of 
ambiguity (where more than one answer could be correct) • For example, In Test 
of General Ability, Level 2 DC, grades 2-3 #3, the correct picture is the one 
of a fairy, "la hada." However, the concept of a fairy is not a culturally 
appropriate one. In item 15 of the same teat, the atimulua word is unednsdoua, . 
"Inconsciente." The correct picture shows a man (sleeping?) on a couch. Item 
11 of the same teat asks to mark "...debajo del dlbujo que les haga pensar en 
entrando solo," and shows two pictures of a boy alone it a door. In one pic- 
ture the boy Is knocking and in the other he is actually crossing the threshold. 

Concerning the visual presentation which is identical for both language 
editions, there Is criticism from many sources with respect to the poor format. 
The Illustrations are crowded and small. Sometimes finding the right answer 
is dependent on finding a smile on a figure which is in one of 48 drawings on * 
a 8 1/2 x 11 Inch page, the smile being smaller than l/32nd of an Inch. Fur- 
thermore, the illustrations are line sketches, leaving much to Inference and 

V ■ ... 

imagination. The spacing la very poor. Often in a series of drawings for 

an item, each drawing Involves more than one (person. In these cases it is 
difficult to tell which drawing the many people are supposed to be a part of. 
Finally the pictures are ambiguous, making it difficult to discriminate be- 
tween chicks and birds, cups and glasses, a book and a box of kleenex, et- 
cetera, 

e: ' • 

* %J 



t 



59 



i;l' 

Havsssy ; j>: 

! 1 ’ ’ 

The next Issue having relevance for Spanish-speaking populations Is • 
that of parallel fonts. This Is a key Issue since ehe developers of the 
Inter-American Series claim they have parallel fozms: an English edition 

and a Spanish edition. Examination of both versions,, however, makes it 
clear that the Spanish version is. a straight-forward literal translation 
of the English version and not a parallel fdra. (That it Is not an English 
translation of the Spanish version is apparent from the nature of the Ill- 
ustrations and the cultural content of the items.) Parallel tests, tech- 
nically speaking, measure the same psychological entity but utilise diff- 
erent sets of operations (i.e., items or tasks). A literal translation 
from one language to another does not fulfill this criteria. Given that 

the parallel form notion of the Inter -American Series Is rejected, the cul- 

\ . • 

tural appropriateness of the Spanish edition becomes a major concern. / 

The cultural appropriateness of the Series is a serious issue because 
it superficially appears to be appropriate as;lt is in Spanish and is claimed 
to be a parallel fora (and not just a translation). Such claims lend to a 
more ready acceptance of it by educators than of other tests with or without 

a Spanish translation. Thus the Ser5.es is potentially dangerous in that 

* / " . ■ ■ 

/ , » , 

educators often assume they have chosen a valid test, given the Spanish 
parallel fora, and will investigate the test.no further. Unfortunately, the 
test does not even have much merit as a Spanish test, when one takes into 

account its upper-middle class Anglo-Saxon bias and its use of Puerto Rican 

a ■’ ■/ \ 

Spanish, j | 

• . . & 

The consideration of cultural appropriateness gives rise to the question 
of which children should get which edition of the test. Should Spanish- 
speaking children get the English or Spanish version? Which version should ^ 

Spanlsh-Rurnamed children get? The crux of the issue is that Spanlsh-sur- •• 

/ ■ . . • 

named children cannot necessarily understand, speak, or read Spanish, Children 

* _ / 5 



\ 



A 

A ' 

' 7 



<? 






r 'V-'Oroi- .. 



.vas"V; . 



60 



Havasay 



/;■. -,. ": . 
pyt £y 
\ l 7 ‘ I.--’; 

. v 7 •'•■' 

7 A; 



.m 



who can apaak Spanlah and understand apolun Spanlah cannot necaaaarlly read 
It.. Whlla these art obvioua truths, they are not universally known. Some 

n, '4 , 

school districts give the Spanish version of the Series toall Spanish- sur- 

i 

named children and think they are being very tolerant and culturally' demo- 
cratic In doing so.! othsr schools give the Spanish version to all Spanlsh- 
\ • . ’ 
speaking children, again with the conviction that they are being sensitive 

to the nasds of theae children and are giving then the' maximum opportunity 

. ' i 

to perform well. But, to pass Levels 1 and 2, It la necessary to understand 
spoken Spanlah. To paas Levels 3 through. 5, one much be able to read Spanish. 
And, of course, all of the reading tests require the reading of Spanlah. 

Just how many Spanish-speaking children of the Southwest can read Spanish 
wetl enough to paas testa designed to assess' an Illusive an entity as their 

Intellectual capacity? , ,/ 

./ . •- 

The other side of the question la the appropriateness for children off 

a Spanlah-apeaklng culture of the English version, with Its stilted language, 

\ . ' 

. with Its old-fashioned, Eastern U.S. wearing apparel, with Its Anglo-Saxon ' 

. ' ' • / 

characters # vith its ambiguous questions, and with its poor illustrations. 

One oust ^conclude that the appropriateness and value of the Series , In any 
language, for any group Is questionable. 



w 



'rr-'fc 



il- 



•S 



f } 






J r 1l l' . ;';J 

gut; 

% 



•'■-SSf-V-ws 



' ; 1 

: % ■ 
nsi 

t 

. -ii 

J 



r 

- 5 T 

i 



. w. 

- ; .n - 

~*X 

;-.q 

P 

! 



I 






i- 

$ 



A./ 



Htniiy 



A3 

■ -A3 






*r 

: • . / 



\ : 



61 



Ritlmt t » gwitki 

In examining Che Series on* muse be concerned about its reliability 
and validity. A problem in examining a series of tests as large as the* 
Inter-American Series is that many poor technical propirtles tend to be 
overlooked due to the sheer bulk of the 'information pnsented (subtests, 
totals, forms, levels, etc*)* Thus, although a quick ippralaal of the re- 
liability of the Series reveals that there appears to 1>e much data on it, 
a 'closer examination reveals that it is impossible to lunmarlte the rail- 

I ' - • * 

ability data as it Is unsystematic and that such work Ls lacking on ths 
reliability of all parts (and form* and levels) of ths 
a reliability coefficient of 0,45 does not make much iiipact when on a 
page of 50 coefficients. It does make a severe Impact on the life of a 
child Wio-has to take such a (sub) test. Such a coefficient is unaccept- 
able as ift indicates the test Is Unstable and incooslsientm 

With respect to the validity the reader Is ref errod to the remarks 
on page 9, In summary, It may be said that the test author(s) and pub* 



11 sh 



era have been grossly negligent lb making available a test on such a 



large scale which has no validation. One wonders what 
thought they were doing. 



The problem* arising from the luck of investigation of reliability 

' \ ; ' ' • J . 

and validity are greatly magnified by the existence of the alleged Spanish 



parallel form. This form, in combination With its availability in levels 



exist 

Jlth 



the test constructors 



attractive to ed-/ 



covering preschool to grade 13 makes theories highly 
ucators.' However, in light of the fact that the Spanish form is act par- 



allel, that the Spanleh language usage ls poor, /and that the reliability 






and validity are so lacking, one can see what a deceptiW teat the Inter- 
American Series actually la. ^ " 

" , /' - * >78 






vr 



PARTIII 



ABSTRACT 



of ' ' / 

I * 

W A SYSTEM FOR CRITERION -REFERENCED ASSESSMENT OF A BILINGUAL 
CURRICULUM 1 * by Eduardo A. Apodae a director Project Hacer Vida, 

Title VII Bilingual Education i, 

.... • 

This pioneering effort by the staff of Title VII Bilingual 

1 . . v _ ' 

Education Project "Hacer Vida" began in April" of 1971. At the 
tine, no testing alternatives existed for the project that was 
/dissatisfied with standardized tests. The choice was always one 
of which standardized neasure would be used. The overwhelming 
majority of these instruments are designed to measure competencies 



in the English language. 



Another inconsistency in the initial es .luation method 
used resulted in trying to measure the achievement of performance/ 
objectives through the use of standardized instruments. There 
was a lack of correlation between what the tests were testing, 
and what the teachers were actually teaching. It came as no 

surprise to anyone when the six participating superintendents 

' f • ■ ' . ' / 

voted to eliminate all standardized tests from the 1971-72 j 
Evaluation Design. The Berkeley Conference presentation is in 
effect a 'blow-by-blow* description of the events that have 
been experienced by project personnel in designing an evaluation 
alternative to norm-referenced measures. 



79 



62 



63 



The "Hacer Vida" Criterion-Referenced 'Model has 
’implications for other bilingual education projects but also 
to traditional progress. The project is in the process of 
implementing an instructional system based on performance 
objectives for both the Spanish and English curriculum. 

Project staff are participating in the design of 
Instruments that actually can test what is being taught. . 
English Criterion-Referenced Instruments have been developed 
for first and second grades in the areas of Language Arts and 
Math. A Spanish Criterion-Referenced'. Instrument has also 
been created for use in-both first and second grades. Teachers 
in vo 1 vedi n this effort have been continuously refining their 
product. .. ' 



One of the most valuable * spin-off* benefits has been the 
participation by teachers in determining what accountability 

o 

model they will have to teach by. Teachers in the program have; 
ini effect, designed the tests they are being evaluated by. 









A unique feature of this criterion-referenced assessment 

model ;ls the utilisation of a studeit assessment card bSsed on 

/" ■! 

the McBee Keysort System. As students accomplish objectives,^, 
their card is punched. A group of 30 cards can be easily sorted 
with aneedle to pull out groups of students that have not met 

/ / s' \\ 

the desired objective. 



Gy 



■■ 






er‘ 










/ 



/ 



O 

ERIC 



64 



Another component of the evaluation model ia the / 
performance objective box. First grade teachers worked as 
a team in compiling their own box of objectives. Second 

*• *n 

grade teachers worked in similar fashion to organize theirs. 



The steps that were taken in coming to agreement on a set of 
objectives were: A. Research and review of all available 

performance objective models, such as the XOX Objective Bank. 

• % ' 

B. Selection of objective clusters; C. Concurrence on final 
selection* D. Revisionof selected objectives onto the 
'"Hacer Vida" Objective Card Format. E. Identification of 
optional procedures that could be used in teaching/each 
objective; F. Citing textbook references on each objective 
card to link each objective with . appropriate lessons. 

A publication entitled "A System for Criterion-Referenced 
Assessment pi a Bilingual Curriculum" by Eduardo A. Apodaca 
is currently available at a nominal fee from Title VII Project 

_Hacer Vida, Office of Riverside County Superintendent of Schools 

/ • • ‘ 

46-209 Oasis St. Indio, Calif. 92201. Statistical information 
on the criterion-referenced instruments will be available on 

1 . • l 

the 1971«72 Final Evaluation Report, to be published by August 

■ ■/ . • 

1972. 

•b ^ 

Eduardo A. Apodaca, Director 

Project Hacer Vid^ Title VII-ESEA 

Office of Riverside County Superintendent of Schools 

46-209 Oasis St. 

Indio, Calif. 92201 

(714) 347-8511 ext. 313 ' ^ 



\ 



81 






4. 

I 

f* 

I 



/ 








PART XV 



SQM CAOnOHAKT MOTES Oil AHEKPriHC TO ADAPT IQ TESTS 
POD USX WITH MXMDRITY CHILDREN AMD A 
RSOPIAGETIAH APPROACH TO UniLLKCTtAt ASSESSMtfT: 
PARTIAL REPORT OP PRELDOKAHY FTNDINGr* * 



Idvtrd A* Dt Avila 

• y 

Multilingual Assessment Program 
(Joa R. Ullbarrl - Pro j act Dirac tor) 



Traditional taata of Intalllganca art loapproprlata for tha Minority 
child. Thty art particularly Inappropriate for thoaa vho cost fro* non- 
^ English speaking backgrounds. Such diverse groups at tha popular press, 
tha courts , civil rights organisations as vail as state and federal agen- 
cies have all bean Involved In pointing to the failure of the tasting In- 
dustry to fully consider tha cultural andj^ngblstlc differences of nlnorlty 
children idten constructing, publtahlng and selling these tests. 

Since the Industry stands to gain increased revenues through the use 
of its Materials in federally-supported programs, it has responded to this 
crltlclsa byt 



1) translating existing intelligence tests for non-English speaking 

/ 1 * 

/ children 

2) adjusting norms for ethnic sub-groups 

3) attempting to construct culture- free tests 



/ 



/ 



/ 



/ 



/ 



There are distinct probleas with each of these approaches* 



♦Acknowledgement 



7 • 
/ 

/ ' 



/ 



The project directors would like to acknowledge their special indebtedness 
to George McCormick, Principal of Haselton School In Stockton, California. 
Special thanks are also given to Gregorio Klos and Toni Castillo, who 
assisted In the data collection, to Marvin Hanely for his special help in ; 
analysing the data, the entire staff of the Multilingual Assessment Program, 
and to Stanley Prance Who assisted in the preparation of the manuscript. 
Finally, we would like to extend our deep appreciation to iuan Pascual- 
Leone of York University In Toronto, Canada for the use of his Flgural In- 
tersection and Water Level tests* / , « 






I *»B«*BKKm«aWSI3S3S^ 



66 



De Avila 



With reapect to translation,, several problems arise. First, regional 
differences within a language make It difficult either to use a single 

translation or to compare acroas different translations. Thus while the 

/ * 

word "toatone" refers to s quarter or a half a dollar for a Chicano child, 
for a Puerto Rican it refers to a squashed section of bnnanna which haa 



i 



been fried. Second, the assumption that non-English speaking chlldr4 
speak one-language exclusively lends to mono-lingual translations which,- 
In many cases, are not related to the actual spoken language of the child 
which may be a combination of languages* This assumption leads to the 
further assumption that, because a given language Is the spoken language 
It Is also the Written language*," One finds many examples of tests 

written In Spanish being given to Chicano children who may speak Spanish 

/ 

but who have had absolutely no prior Instruction In reading Spanish, 

Third, another problem In translating tests is that words In one language 
have frequencies and potencies which generally cannot be compensated for 
In a direct translation to a second language. In other words, having a 
cognate Is no guarantee that It Is used In the second language with the 
same frequency as It Is used In the first language. For example, the 
word "pet" Is a common word in English yet, it3 Spanish cognate, "animal 
domestlco," Is almost never used, A related problem In this context has 
to do with the fact that translating a word from one language to another 
can-vastly-alter ita meaning.. ..Thus, .there are wide j/ar let lea of seemingly 
harmless English words which translate Into Spanish swear words or 
"palabras verdea." This being the case, translating a large egg Into a 
^tiuevon" may satisfy grammatical requirements and aeem harmless to an Anglo 



translator, it nevertheless falls to consider that portion of the word's 

*i> . * « * 

meaning tdilch "does not translate.' 1 Fourth, straight forward translations 
of existing tests represent a complete denial of cultural differences. In 
many bases* this leads not only to unfair tests hut to tests which ^require 
the child to break from his own cultural tradition. Thus, asking an 
Indian child "who discovered America" or asking a Moslem child to "draw 

n ' / 

a man" requires not only that the child break with cultural and religious 
tradition but also that he set himself apart from his own reference group. 

The second major response of the testing Industry to criticism with 
respect to the testing of minority children has t>een to establish regional 
and ethnic norms; In other, words, simply to lower the criterion levels on 

the basis of ethnicity. This leads to expecting less from the brown, black 

1 ■ 

or lover socio-economic white students than from the middle class Anglo 
child. Awarding "bonus points" to minority children to compensate them 
for their "deprived background" Is based on the same proposition as lower- 
ing norms. It Is nevertheless a simple-minded solution to gratuitously 
award Chlcano children extra points "because they speak a little Spanish." 

These practices are all based on the common notion that ethnic norms 
should be established. Such practices are potentially dangerous because 
they would provide a basis for Invidiously determined comparisons between 
different racial groups. The tendency would then be to assume that lower 
scorss are ultimately Indicative of lower potential and would not only 
continue the self-fulfilling prophecy of lower expectation for minorities 
but would also reinforce the genetic Inferiority argument advanced by 
Jenssn (1958), Shockley (1971) and others. 















68 De Avila 

Third, there is a problem' Which cuts across these* Issues which in .. 
many cases may negate attempts to "clean up the tests/ 1 This problem in- 
volves validating a test of intelligence by correlating it with measures 
of achievement. The assumption is that the brighter the child, the 

\ greater his achievement. This appears reasonable enough*, for certainly 

if a child has a high capacity, it must be related to some sort of achieve- 
meat. With respect to the minority child, however, the relation between 
intelligence and achievement breaks down. It is a notorious fact that 
traditional curriculum has little relevance to the minority child. As 
such, any attempt to validate intelligence tests for these children by re- 
lating them to traditional curriculum is doomed to failure because a bright 
Cnlcano or Black child does not necessarily thrive on a curriculum designed 
for a mid-western Anglo population* 

The fourth major difficulty in the testing of non-Anglo children is 
1 » t 

the false assumption that a^test^can be c on ^ ruct^d^idilch^ n t 

of culture. Such a test is difficult if not impossible to construct. Con- 
sider that a culture must inevitably be defined by a particular set of 
referents. Intellectual activity must per force refer to the manipulation 
of these referents. As ouch. Intellectual activity or any mental operation 
must Involve the processing of' information, that is, referents defining an 

V < 

environment or culture, which, by definition defines that particular culture. 

Aside from the problems Inherent In depicting a culture without a referent, 

* 

to Ignore this problem would be to recapitulate the problems in Descartes's 
assumption that objectless (without a referent) is possible. 

Over and beyond these problems, an analysis of the content and format 



De Avila 



69 



of items used in a large number of traditional IQ tests reveals several 
highly Interrelated types of items suggesting that the tests. are measuring 
something other than that for which they were designed. Traditional IQ 
measures may therefore also be described as measures of social izatlon, pro- 
ductivity or level of aspiration, specific experience and endurance. Con* 

r 

slder the following as only a few of the possible illustrations that can 
be mentioned . 1 



: : ; ' \ 

Socialization . Items of this type draw primarily on the nature of 

one's socialization and are couched in such a way as to actually be measures . 
of the child's family value system. The referent system, is of course, the 
dominant Anglo middle class. The confounding effects of this problem are 
particularly evident In the "comprehension" scale of the Weschler (WISC) 
where children are askedi such questions as: ^ 

"What Is the thing to do If you lose one of your friend's toys?" or 

/ 

"What is the thing to do If a fellow much smaller than yourself starts 



a fight?" — 

/ 

Allowing for the stilted manner In which the question is phrased and assuming 
that the child knows all of the vocabulary, it still seems perfectly ob- 
vious that this type of question has little or nothing to do with a child's 
ability to process, manipulate or code information but, rather with whether 
he has been socialized under the particular ethical system implied by the 
question. 

Productivity or level of aspiration . Many teals confound what they 
hope to measure with a" measure of . productivity or level of aspiration. For 
example, in a large number of tests the child whp^roduces the largest number 



:.*rm , 71 . 






70 



De Avila 



of responses is awarded whereas, the child who (for whatever reason,) 
produces fewer, is punished by receiving a lower score. Thus, in the 
Draw-A-Man, the child who produces the more elaborate figure receives 
the higher score. The problem here stems from an assumption that all 
subjects will produce as many responses as they are able, i.e. have the 
same level of aspiration. The effects of this assumption are partlcu- ^ 
larly evident in timed tests, which constitute the majority of published 
tests. In these tests children are required to ’HTorkJquickiy and 
efficiently ,, without regard for the child w^o is simply not in a hurry 
nor particularly motivated to be so.^ 

Another type of test which may be grouped under this category is 
the "endurance" test." This particular type of test, for purposes of 
boosting statistical reliability, requires that the child answer a large 
number of questions which vary little in content. This problem is par- 
ticularly evident in the group tests such as the Lorge-Thomdike Intelli- 
gence Test and the California Test Bureau Series. 

Experience of specific learning . In tests which require subjects 
to answer questions of fact, there is an implicit assumption that the 
children taking the test will have had a more or less even chance of 
having been exposed to the fact being tested by the question. The spur- 
iou^npss of this .assumption is witnessed by any number of examples where 
children are asked questions of vocabulary. Granted a high positive 
correlation between intelligence and vocabulary, it is impossible, never-^ 
theless , to detetmine whether a minority child has missed a test item 
because he lacks the capacity to understand a given word or because he 



De Avila 



71 



simply has never been exposed to the word, e.g., Nitroglycerine" (in * 

the W1SC), "fire hydrant" (in the Betty Caldwell and Peabody) or "crevice" 

* ' * • * / 

(In the Otls-Lennon). 

The fundamental problem with most of the tests mentioned above and. 
Indeed IQ tests in general, is that teat publishers have failed to Cully 
consider the problems associated with testing the minority child. More-* 
over, it would seem that the attempts to deal with these problems by the 
above mentioned means will lead to limited success for the reasons dis- 
cussed. However, since the results of tests are used to determine the 
educational and, by extension economic and social future of school-age 
children it, therefore, behooves test publishers to more fully consider 
the minority child's cultural background. A publisher who has considered 

cultural background would know, for example, that the Chlcano child is 

' , ■ \ 

reluctant to guess when he doesn't know the answer to a question; that 
the Indian child is taught in the spirit of cooperation rather than com- 
petition and is reluctant to compete with his peers; that Black, Chlcano 
and Indian children tywe little experience in developing test-taking 
strategies which would enhance their performance; and finally, that there 

are a significant number of children from all of these groups who view 

. ^ 

the schools as threatening, hostile and alien. 

In summary, it may be said that the major problem in the psychometric 
approach to intelligence testing described in the previous notes is that 
environmental factors such as linguistic and cultural differences have 
not, been taken into account. The position to be taken .here, in contrast 

t • 

to the psychometric approach, would argue, in agreement with Piaget 



72. 



D'e Avila 



/ ' 



Chat Che determination of intelligence oust be studied through the examln 



aclon of lncra-lndlvidual rather than lnter-lndlvldual approaches. Thus 
•In Che present view. Intellectual development Is characterised by the ex- 
tent of Internal control of functioning versus external control of func- 
tioning at any given stage of development. 

With the understanding that testing procedures must distinguish be- 
tween external-environmental and Internal-developmental variables, the 
determination of a subject's Intellectual development thus becomes a two- 
step process. In the first step It becomes necessary to remove the effects 
of these external • factors before actually testing the subject. The second 
step involves a determination of the extent of Internal variables through 
tftq use of tasks which vary In the degree of control required to produce a 
correct response. 

The use of a "experimental repertorle control (ERC)" provides for the 
control of external variables which can reflect diverse experentl'al and 
stylistic differences rather than differences In Intellectual capacity or 
Internal control 'of functioning. The application ..of a controlled repertollre 
In which subject differences are removed through pretraining procedures has 
, been attempted by Pascual-Leone & Smith (1969), Pascual -Leone (1970) and 
De Avila (1971). 

Pascual-Leone (1970) used a variety of the Piagetlan tasks and the 



Wltkln et. al. (1962) measures of field dependence-field Independence In 



a factor analytic study of cognitive development and cognitive style. An 






essential feature to Pascual-Leone 1 a procedures is that prior learning is 

/ ’ ... • * ' s . 



used as/ a control variable rather than as a dependent variable (see Pascual - 






ild 




. ’ k .;>:*■ v. . 
«•; . 

mi:-, 

•ifev •- 

4 ; '.: \ 

■ 

M?.-' 

' 

4S£i.'.. ' 

ifc 

m>. 

■ W-' 
Wr : ' ' 

if;: 



ML- 



o 

ERIC 



l 

. i 



S 



N *■ 

. V 

\ X 




./ . 



\ ■ 



De Avila 






73 



/* 



m.- 



UtM 1 1 Mth, 1969) . Using prior learning as a control, Psecual -Leone \ ' 
(1969) found highly otoblo results icmi a number of Plagetlan twitn, 

Id discussing tho failure of previous experiments™ to obtain high corre- 
lations aaoog Piaget's tanka, Paacoal-loono (1970) notoa that thoao poor 
raoulta maybe duo to (1) poor rollabllltloa caused by tho ooall nuhber 
of Items per toot, (2) failure In "relevant llnguletlc pretraining, " and 

(3) failure to note that subjects do not always function at their "etruc- 

’ 1 ' _ ■ 

tural" or hlghoat level of operatlvlty. 

In another study by Do Avila (1971) ualng upper-middle class children, 

"It . wan found that «dien the background of the aubjecta waa controlled through 
the use of experimental control tanka, low correlatlona were found between 
a etandardlsed Intelligence teet, (the Otle-Lennon) and a nuaber of Plagetlan 
taeka. Such results. Imply that the IQ neaeure nay be highly related to 
the external varlablea such ae educational and aoclal background. Moreover, 
theae flndinga suggest that when thoae factora are controlled for through . 
pretraining, IQ ceanen to be an adequate esaauro of Intellectual developnent. 
Replication of thin finding with low^aocioeconomlc subjects would aupport 
this position. More laportant to the current research la the purpoee of ea- 
tabllshlng the reliability and construct validity of the current aeaeurea 
with a new respect to the Plagetlan developmental hypotheale. 

A aecond major purpose of the present study which replicates and ex- » 

N *• 

panda the largely unpublished extensive results of Paacual -Leone, Parkineon 

and De Avila at York University and/or Boulder, Colorado was to examine the 

• ■ - / 
psychometric properties of several Plagetlan tasks Which vary accordlng to 

the extent to idilch external variables are controlled. The third purpoee 



£0 



/ . / 



\ 



t 



/" 




\ ‘ ■ 






74 



De.Avlla 



to which this research Is directed is the issue of group administration of 4 
Plagetlan tasks. Educational situations usually require group testing be- 
cause of the large number of subjects Involved relative to the manpower 
available. Plagetlan tasks have historically been individually administered. 
However, Dodvell (1961) and Barker (1960) haye shown that the child's con- 
ception of number can be tested in a group setting; De Avila, et. al. (1969, 

/ . .. 
i ■ 

1968) have measured several conservation tasks and spatial perspective 
problems in group situations. De Avila et. al. (1969) found adequate reli- 
abilities for the conservation of substance and egocentrldty measures, 
i jesting the further possibility of using Plagetlan-bssed group measures 
to evaluate the developmental -psychometric properties of tests which are 
applicable across a broad range of development. Similarly, Pascual-Leone 
(1969 and Pascual-Leone & Parkinson unpublished) haye adapted a number of 
Plagetlan and neo-Plagetlan tasks to group settings with a high degree of 
success. 

The goals of the present research were thus: 

M * 

1. To examine some of the relationships between the neo-Plagetlan 
approach to developmental scaling and traditional approaches em- 
bodied in psychometric testing. 

2. To test the applicability of the ’'experimental repertoire control" 
(ERG) concept as a procedure for testing minority children. 

3. To test the feasibility of using Plagetlan measures to determine 
the developmental levels of minority children. 

4. To examine the psychometric properties of' the Drav-A-Man and 
Columbia Mental Maturity Scale for minority children. 



SI 



O' 






x- 






/ 



De Avila 



,.S' 



75 



5* To examine the relationship between developmental and I.Q. an- 

A ... 

alysls procedures for minority children* 

Instruments 

neo-Plagetlan tests were given: the Cartoon Con- 

servation Scales, (De Avila, 1968a; 1968b; 1969), the Conservation of the 
horlzontallty of water as measured through the Water Level Task, (Pascual- 
Leone, 1966; 1970; Pascual-Leone & Parkinson, unpublished), the Floral 
Intersection task (Pascual-Leone, unpublished; Pascual-Leone & Smith, 1969), 
and the Serial Task (De Avila, 1971). In addition two standard measures 
of Intelligence, the Columbia Mental Maturity Scale and the Draw-A-Man 
were also used. Each of these measures are briefly described below. 



CARTOON CONSERVATION SCALES (CCS) 

Several measures of Piaget 1 * conservation tasks were assessed by 

' ‘ I . . • ‘ 

means of the cartoon format developed by De Avila et, al« (1968a; 1968b; 
1969). In De Avila's procedure, three cartoon frames are presented In 
which two children discuss a Plsgetlan task. In the first frame an equality 
-la-s»tahlris hed b e tween t wo ob jects according to the dlmenslor| being studied 



(l.e. , number, length, substance, etc.). In the second frame an Identity 
ttans formation la depicted and In the third frame the question of conser- 
vation of equivalence Is aaked. On the right side of the pfnel three posr 
slble answers are presented. The three alternatives which show the characters 

responding to the question are randomly ordered as to correctness In order 

* ✓ 

Co avoid, poslClon effecCs. Similarly, wording la alcered Co teen 

In order Co avoid Che possible effecCs of acquiescence. Background on Che 

conservadon scales and an UlusCraClon of cha dialogue fro. each scale arc 



I 



i 






76 



De Avila 



presented below. . 

In its current form the CCS cons istedTof ^thirty cartoon panels. 

There were six examples of five tasks. The panels were presentedto the 
subjects and the story line was read and elaborated upon in order to facil- 
itate understanding of the question. The subjects task was simply to mark 
the one (alternative) "that makes the story true." 

Conservation of number is measured by showing blocks on a table. 

The dialogue is as follows; Frame One: "How many blocks are there?" 

Frame Two: 'There are seven in each row. i'll put these in a bunch." 

Frame Three: "Are there fewer in the row than in the bunch?" There are 

three possible responses from which the child chooses his answer. Each 
alternative provides the child with written (i.e.,. child points to one, 

another, or to both sets of blocks.) As in all cases the child simply 

; % ; 

picks his answer by putting an "X" on the picture "that makes the story 

true." (See example l) 

Conservation of substance is measured through items such aa the car- 
toon where the following dialogue takes place. Frame 0$e: ^These two 



clay balls are the same size." 'They both have the same amount of clay." 
Frame Two: "I'll roll one into a long' hot dog shape." Frame Three: "Does 

one have more clay ttun the other one now?" In the response frames the- 

/ . . \ 

responses are: (boy points to both) 'They have the same amoun^:", (boy 

points to hot dog)- ^The hot dog has more", (boy pointq to ball) 'The 

ball has more." (See example 2) 

■ • / .. 

Conservation of surface performance on the task requires that a sub- 
ject recognize that no matter where a given number of objects are located 



‘ D# Avila 



77 




on a surface 9 the aaount of surface exposed remains the same. An Illustration 
from the CCS uses a toy farm placed on a table. The dialogue in Frame One 

is: "See the little farm.*' "The cpwa are all over the table/' In Frame 

Two the dialogue is: 'The cows need to have more grass." "Put the build- 

ings tin the back of the table." In Frame Three the question is; "Is there 
■ore space on the table nov?" The response order IsV 'There is less space 
now." 'Thsrs is the sane space." "There is more space now." 

Conservation of weight in the CCS one of the illustrations involves 
two children balancing oh^ a seesaw. In the first frame, the two children 
are shown from a distance and one says "Key, this is fun. Ve can go up and 

down." In the next frame the second child says "Let's see what happens when 

‘\ 

we stop." In the third frames the two. children are shown in a balanced- 
horizontal position and one child asks, "Whst will happen if I lie down?" 

The three alternatives show the seesaw in several .position^ with the child 

who asked the question in a lying down position. It should be noted that 

the position of the child who is lying down is depicted in such a way as to 
indicate no change in the distance between himself and the fulcrum (seesaw 
center post) so as not to alter the leverage relationships. (See example 3) 

Igocentrlcltv .In this measure, the subject is asked to picture how 
a jetting would look from a perspective other than the one from which he is 
looking. One illustration from the CCS uses the concept of taking a picture 
of a toy bam, silo, and tractor as follows: "See my new camera." "Take a 

picture of ray^arm." "I'll take the picture from over here", (view opposite 
that of person who "owns" farm). Frame Three: f'What will the picture look 

like?" The response frames show the picture taker's viewpoint, the "owner's" 



■» p 'W 



\ ■ 1 ^ 



V 



/ 



• • / 

/ 



/ 



o 

ERIC 




78 



X. 



| 

~5 

Dc Avila 



! 



viewpoint and a side view, each with the caption, "It will look like this." 
(See example 4) ^ x 

s' 

.s' ‘ 

MATER LEVEL TASK (WLT) 

The conservation of^the horlrontallty of water measure utilised here 
was Introduced by„ Pascual -Leone (1966, 1970) as a standardised quantifiable 
version of x the Plagetlan test (Piaget & Inhelder, 1968) . A more complete 
description of the relative parameters of this type of task can be found in 
the semantic-pragmatic analysis of the relative strengths of objects in the 
field done by Pascual -Leone (1970) • 

In this study, a special version of Pascual -Leone 'a group tests by 
Pascual -Leone A De Avila (1972) was used. Subjects were presented with in- 
dividual booklets which contained five horlsontal or vertical two-dimensional 

/ * 

bottles, eight two-dlmenslonal-tllted bottles and four three-dimensional 
bottles, two of which were also tilted. The subject was asked to draw a 
line where the top of the water would be if the bottle were half full and 
then to place an "X" in the. part that contained the water, 

,/ 

FIGURAL INTERSECTIONS TEST (FIT) 

The flgural intersection test is a group administered paper-and-pencll 
test in which subjects are required to place a dot in the intersecting space 
of a varying number of geometrical figures. It was developed by Pascual - 
Leone and constitutes a flgural analogue of Piaget's "Intersection of Classes" 

(1932), The type of overlapping figures utilized in this test vert originally 

* ■ / 

devised by Abel son (1911) for another, purpose. In a series of unpublished 
studies, Pascual -Leone has shown the test to have a high degree of internal 



.£5 



5$ 

--j 



79 



He Avila 

consistency (split-hslf reliability « .89) as well as being significantly 
related to tests of similar logical structure (Pascual -Leone & Smith, 1969). 
For example 9 It has shown a high correlation with ,the WLT described above. 
Combined with the WLT, In the present context, It was taken as an Index of 
developmental level. This relationship has been previously found In a 
series of unpublished studies by Pascual -Leone & Parkinson (1969). 

I . 

SERIAL TASK (ST) 

The aerial task (De Avi i la t ^lj7X)..ia-a . ahort^term^ memory task which 
Is Individually administered in two phases* First, subjects are pre-exposed 
to the stimulus materials used In a second testing phase. In the pre-exposure 
or pre-training phase, each subject Is shown a series of 10 different 35 uva. 
color slide transparencies of pictures depicting a donkey, house, airplane, 
etc. Subjects sit facing a screen situated on a wall six feet away. The 
10 Illustrations are presented by means of a Kodak 650 carrousel slide pro- 
jector. To Introduce £h^ task, each subject is shown each figure and asked 
to give Its name and color (i.e., "s yellow hat”) . Following this Initial 
Introductory phase and after the subject was able to correctly Identify each 
figure ten times when presented In rapid random succession, the testing 
phase was begun. 

The test phase was 'conducted In a "free recall" manner (Adams, 1967) 
where, without any prior knowledge of the length of a list, the subject 
was asked to reproduce the list Ignoring the order In which the Individual 
Items are presented. Subjects were shown a. series of Individually presented 
figures terminated by a blank slide, and asked to tell the experimenter 
what they saw. The exposure time for each Individual slide was .750 msec. 



